Pancreatic islets are a unique tissue with highly specialized functions that determine many of the major metabolic responses to fasting and feeding in higher mammals. In the fed state, a robust secretion of insulin occurs followed by a compensatory increase in insulin biosynthesis. During prolonged insulin resistance, such as that occurring with obesity, compensatory hyperinsulinaemia emerges associated with islet hypertrophy and hyperplasia [1]. For many years, attempts to dissect the molecular steps involved in these pancreatic islet critical adaptive responses have relied on analysis of expression of individual genes through sequencing of cDNA clones and Northern blot analysis of islet mRNA. Later, more comprehensive means of monitoring expression patterns in islets under various experimental conditions were introduced such as subtractive hybridization [2] and differential display of mRNA [1, 3, 4]. These methods provided important insights into differences in expression levels under various experimental conditions, yet did not provide a global analysis of gene expression.

With the development of human genomics, two major means of examining global gene expression have emerged, quantitative microarray analysis and serial analysis of gene expression (SAGE) [5, 6]. Several studies have recently documented the utility of microarray analysis in exploring a number of experimental issues in diabetes [7]. In contrast, while SAGE analysis has been applied in various other fields of biomedical research [8, 9], this methodology has not been used in the study of diabetes. SAGE analysis provides a method of examining comprehensive gene expression in a tissue of interest in a quantitative fashion that does not rely on prior knowledge of transcripts or on variable hybridization conditions [5, 6]. This method uses the identification of a short stretch of cDNA sequence sufficient to identify specific mRNAs. To facilitate efficient DNA sequencing, these short sequences, or sequence “tags”, are concatamerized into cDNA clones, and the number of tags for a given transcript represent the relative expression level of the gene. The method uses poly-A RNA as the template for cDNA synthesis that is then cleaved with an anchoring restriction endonuclease such as NlaIII, cleaving at four base sites (CATG) found in most cDNAs. The cDNAs are then ligated to linkers with a type IIS restriction enzyme site and then digested into small tags of usually 14 base pairs with the enzyme BamHI. These tags are ligated to form ditags followed by concatemerization and subsequent cloning. Thus cDNAs with numerous small easily identifiable sequence tags of 14 bp each, preceded by the NlaIII site, can be obtained in a single clone to facilitate sequencing. As a result sequencing as few as 1000 clones can generate enough sequence tags to obtain a broad view of the relative level of expression of the most abundant transcripts in a particular tissue.

In the current experiments, tissues from multiple pancreatic donors were pooled to construct SAGE libraries. To identify an expression profile of pancreatic islets, three libraries were constructed. These included (i) islets isolated by a standard gradient centrifugation protocol often utilized to prepare islets for transplantation [10], (ii) islets isolated by gradient centrifugation followed by further hand selection, and (iii) exocrine pancreas from which islets had been removed. Analysis of a total of 48 915 sequence tags obtained from the three libraries revealed expression profiles that could be compared with each other, and with SAGE libraries from other tissues. Additionally, by a digital subtraction of the exocrine pancreatic contaminants, the relative level of expression of more than 2000 of the most abundant transcripts expressed in human islets is reported. This data provided us with a comparative overview of the distribution of the molecular functions of transcripts in islets and exocrine tissue. We also were able to identify major islet transcripts located in chromosomal regions involved in linkage to diabetes.

Materials and methods

Tissues

SAGE libraries were constructed from three tissue sources. The first SAGE library, SAGEHISL1, was constructed with pancreatic islets from three normal human donors (HR39, HR46, HR50, Islet Isolation Core Facility at Washington University School of Medicine). Individual islets were isolated using a standard protocol of an intraductal Liberase perfusion, gentle mechanical dissociation, and a continuous gradient of Hypaque Euroficoll on a refrigerated COBE 2991 [10]. The islets, collected in fractions, were assayed for purity and fractions with a purity of greater than 90% were stored at −80°C for RNA extraction. The second library was prepared from islets from a single donor, HR96, prepared using the same protocol but with an additional purification by hand selection of 300 pancreatic islets (SAGEHR96R). The third SAGE library, SAGEHEXO1, was constructed with normal pancreatic exocrine cells, which were obtained during the process of islet purification from two donors (HR18, HR27). Informed consent was obtained for all tissue donors and the Washington University Human Studies Committee approved the study.

SAGE library construction and DNA sequencing

Total RNA was isolated using Trizol reagent (Invitrogen, Carlsbad, Calif., USA), followed by RNA cleanup protocol with DNAse treatment using Qiagen RNAeasy kit (Qiagen, Valencia, Calif., USA ) to remove any remaining genomic DNA or degraded RNA. The RNA concentration of the samples was measured spectro-photometrically at 260/280 nm and 1 µg of sample was run on a 1% non-denaturing agarose gel to assess quality.

The SAGE libraries were constructed with an I-SAGE system (Invitrogen) according to the manufacturer’s instructions, with minor modifications (Anchor enzyme: NlaIII). 10 µg of total RNA was used as a starting material. For the pooled SAGE libraries (SAGEHISL1, SAGEHEXO1), equal amounts of total RNA from different donors were mixed to add up to 10 µg. After ligating the DiTag concatamers into the SphI site of pZero-1 (Invitrogen), electro-transformation into Electro-MAX DH10B competent cells (Invitrogen) was performed using Gene Pulser system (BioRad, Hercules, Calif., USA). Bacterial colonies from each library were handpicked and arranged into nineteen 96-well plates (1824 colonies).

Sequencing reactions were carried out using ABI BigDye terminator mix and were loaded and run on an ABI3700 automated DNA sequencer. The sequencing chromatograms were analyzed using the Phred base-caller [11, 12] with high quality right and left cut-off positions determined. The high quality region then was screened for vector-adaptor sequence using local alignments. The vector trimming was confirmed by blasting against a database of all vectors. Sequence was screened against a number of contaminant databases (structural RNA, non-self mitochondria, and bacterial sequence). There also was a check for low entropy sequence and for computer processing errors. Finally, the data was parsed into standard dbEST format, detailed at http://ncbi.nlm.nih.gov/dbEST/. The methods for the construction of the SAGE libraries are illustrated in Fig. 1.

Fig. 1
figure 1

Schematic representation of the construction of a SAGE library. The diagram represents the method used to construct a SAGE library from an islet preparation. α cells are represented in red and beta cells in green

SAGE library analysis

SAGE sequences were extracted and analyzed using the SAGE 2000 Software Version 4.12 available on the World Wide Web at http://www.sagenet.org/sage_protocol.htm), and the tags were then mapped using the “reliable” SAGEmap tag to gene mapping databases using build 160 available through http://www.ncbi.nlm.nih.gov/SAGE/ and ftp://ftp.ncbi.nih.gov/pub/sage/map/Hs/NlaIII/.

For the assessment of the relative exocrine content within the islet libraries (SAGEHISL1, SAGEHR96R), the contamination factor (β i ) was estimated with the proportions of the tags corresponding to 13 well-characterized exocrine unique genes. The individual β i was calculated through \(\beta _{i} = \frac{{n_{{bi}} }} {{N_{b} }} \times \frac{{N_{a} }} {{n_{{ai}} }}\) (n bi and n ai : number of tag for the exocrine unique i tag in the islet n bi and exocrine n ai libraries; N b and N a : total number of tags in the islet n bi and exocrine n ai libraries).

To provide an evaluation of gene expression profile of the islet libraries deprived of exocrine content, digital subtraction was done tag by tag using the calculated contamination proportion and the relative abundance of each tag in the exocrine and the islet library respectively. The following calculation was used \(P_{{{b}'i}} = {\left( {1 + \frac{{\bar{\beta }}} {{1 - \bar{\beta }}}} \right)} \cdot P_{{bi}} - {\left( {\frac{{\bar{\beta }}} {{1 - \bar{\beta }}}} \right)} \cdot P_{{ai}} \) where P ai is the proportion of the i tag in the exocrine library; P bi is the proportion of the i tag in the islet library, P bi is the proportion of the i tag in the subtracted islet library and \(\bar{\beta }\) is the averaged contamination factor.

Estimation of transcripts specific to islets

Since their tags were not subtracted and the total number of tags diminished by removing the exocrine tags, transcripts whose proportion was increased through the subtraction were considered to be specific to the islets. Genes increased through the subtraction by 25.2%±(2×0.8%) (by over 23.6%) in the gradient purified islet library (HISL1) and 11.4%±(2×1.7%) (by over 8.0%) in the hand-selected islet library (HR96R) could be considered as relatively specific to the islets, or at least highly enriched in islets compared to the exocrine library (HEXO1).

Annotation, gene ontology functions and chromosome locations

All Gene Ontology annotations and chromosome and cytoband locations were gathered using the UniGene cluster IDs collected through the SAGEmap annotation for each transcript from the SOURCE database [13] (http://source.stanford.edu/). The different Gene Ontology categories were further grouped for the molecular functions using the classifications described by the Gene Ontology Consortium [14] (http://www.geneontology.org/). Some transcripts bear several functions and were therefore counted in every category to which they corresponded. The number of tags corresponding to a transcript found in a category were cumulated to represent the relative importance of each gene ontology class.

Results

SAGE analysis of islet and exocrine pancreas libraries

To provide an estimate of mRNA expression levels in pancreatic islets, three human SAGE libraries were constructed. An exocrine pancreas SAGE library (HEXO1, two donors) was compared to those constructed from RNAs extracted from islets either isolated by gradient centrifugation (HISL1, three donors) or islets isolated by gradient centrifugation followed by further purification as a result of hand selection (HR96R, one donor). A total of 48 915 sequence tags were obtained from the three libraries. The entire data set for the libraries has been deposited in the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/GEO/) and can be retrieved from the SAGEmap (http://www.ncbi.nlm.nih.gov/SAGE/) and Endocrine Pancreas Consortium (http://www.cbil.upenn.edu/EPConDB/) web site. The following describes the contents of the libraries and the results of their analyses.

Only tags represented at least twice in a library were considered for analysis to minimize the effects of possible sequencing errors. To map tags to genes for the different libraries, we used the SAGEmap “reliable” tag-to-gene list (build #160). Among the tags used for subsequent analysis, several tags corresponded either to no gene or to several genes and could not be reliably identified. Between 6824 and 12 480 tags for the three libraries were found to be “reliable” as they had perfect matches to a single gene determined by SAGEmap. All of the tags with a reliable match represented known genes in UniGene (http://www.ncbi.nlm.nih.gov/UniGene/), although some are named while others are currently only ESTs. Multiple tags representing the same gene were then clustered to provide a cumulative count per transcript.

Expression profiles in the exocrine and islet libraries

A total of 13 630 tags were sequenced for the exocrine library (HEXO1). After identification and elimination of all tags counted only once in the library, they corresponded to 1194 transcripts (1013 identified transcripts and 181 remaining different tags that could not be mapped). For the gradient purified islet library (HISL1), there were 1726 transcripts (1524 identified transcripts identified and 202 remaining unmapped different tags), and for the further purified hand selected islet library (HR96R), 1191 transcripts (1042 identified transcripts and 149 unmapped different tags). Amongst the identified tags, between 7.0% and 11.5% were derived from ribosomal or mitochondrial transcripts. While they are in the complete data sets deposited in the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) that can also be retrieved through SAGEMap (http://www.ncbi.nlm.nih.gov/SAGE/), they are not listed in the subsequent tables presented here.

The relative abundance of the top 50 genes, represented by the number of tags for each of the three libraries, is shown in Tables 1, 2 and 3. The table provides the original tag count and tags per million as normalization to compare abundance in the different pancreatic libraries, as well as abundance in other libraries available in SAGEmap. Most of the highly expressed genes in the exocrine library represented well known secreted acinar pancreatic enzymes or factors involved in protein synthesis (Table 1). Surprisingly, a gene whose function still remains unclear, regenerating islet-derived 1α was the most highly represented among the top 50 exocrine transcripts.

Table 1 The 50 most abundant identified transcripts with their relative levels of expression in the HEXO1 (exocrine library) SAGE libraries. Only the top 50 non-ribosomal or non-mitochondrial transcripts within the library are shown in this table. The transcripts are ranked by their relative abundance and represented with their tag counts within each libraries and the corresponding tags per million to allow comparisons.. When the tags could not be matched to any UniGene cluster ID, the name of the transcript has been replaced by its tag. The complete lists of the transcripts in these libraries can be retrieved through SAGEmap
Table 2 The 50 most abundant identified transcripts with their relative levels of expression in the HISL1 (ficoll gradient-purified islets library) SAGE libraries. Only the top 50 non-ribosomal or non-mitochondrial transcripts within the library are shown in this table. The transcripts are ranked by their relative abundance and represented with their tag counts within each libraries and the corresponding tags per million to allow comparisons. When the tags could not be matched to any UniGene cluster ID, the name of the transcript has been replaced by its tag. The complete lists of the transcripts in these libraries can be retrieved through SAGEmap
Table 3 The 50 most abundant identified transcripts with their relative levels of expression in the HR96R (gradient-purified and handpicked islet library) SAGE libraries. Only the top 50 non-ribosomal or non-mitochondrial transcripts within the library are shown in this table. The transcripts are ranked by their relative abundance and represented with their tag counts within each libraries and the corresponding tags per million to allow comparisons. When the tags could not be matched to any UniGene cluster ID, the name of the transcript has been replaced by its tag. The complete lists of the transcripts in these libraries can be retrieved through SAGE map

In the gradient purified islet library (Table 2), while insulin was seen as the most highly expressed gene, many exocrine transcripts were observed as well, presumably due to exocrine contamination during isolation. In contrast, in the gradient purified library that was further hand-selected, the relative exocrine content appeared to be diminished (compare Tables 2 and 3). Notably, many stress response related transcripts like reg-1α and β, clusterin or heat-shock proteins were also present at high levels in the different libraries, perhaps reflecting stress the cells were subjected to during the preparation of the tissues [15, 16, 17].

Estimation of expression profiles of the islet libraries after subtraction of the exocrine tags

To assess the relative exocrine content within the islet libraries, the proportions of the tags corresponding to known exocrine genes were analyzed. Since the relative expression of these transcripts can vary from individual to individual, and the three libraries were constructed from different donors, the analysis was based on 13 relatively abundant known exocrine transcripts. As no exocrine tag is supposed to be present in the islet libraries, the mean contamination factor was calculated as the mean for these 13 known exocrine genes (see Methods). The use of this number of different exocrine genes also helped to decrease the chances of over/under estimation of the exocrine content that could derive from the lack of precision of genes represented only by a few copies. These calculations estimated that 25.2%±0.8% of the tags from the gradient-purified islet library were due to exocrine contamination. The gradient-purified and hand selected islet library showed a much lower content with only 11.4%±1.7% of its tags represented as exocrine transcripts (Table 1S, Supplementary Materials).

To provide estimates of gene expression profiles of the islet libraries deprived of exocrine contamination, digital subtraction was performed tag by tag using the calculated contamination proportion and the relative abundance of each tag in the exocrine and the islet libraries, respectively. The consistency of the calculations was assessed by comparing the result of the subtraction for the two islet libraries one against the other. By comparing the expression levels of each individual gene in the two libraries, a linear regression of expression levels in HR96R and HISL1 provided a very high correlation with R2=0.9664. Subsequently, merging the results of the two islet libraries provided estimates of relative levels of gene expression from four donors. The two libraries were merged using tags per million as a normalization factor and giving a weight of three to the gradient purified islet library (HISL1, three donors) and one for the hand selected islet library (HR96R, one donor). The top 50 transcripts in the merged islet library after subtraction is presented in Table 4 and the complete list of 2180 transcripts is tabulated by abundance in the Supplementary Materials (Table 2S).

Table 4 List of the 50 most abundant transcripts (non mitochondrial and non ribosomal) within the exocrine-subtracted merged islet library. The transcripts are ranked according to their relative abundance using the counts in tags per million. The complete lists of the transcripts in this library can be found in Table 2S of the supplementary material

The apparent level of the exocrine transcripts remaining after subtraction was decreased (for example compare Table 2 and 3 with Table 4). The validity of the analysis is suggested by inspection of the genes listed in Table 4, as many of these most abundant transcripts have been previously described as selectively expressed in islets. These include insulin, transthyretin [18], glucagon, the pro-protein convertase inhibitor (proSAAS [19, 20]) clusterin [21], carboxypeptidase-E [22] and islet amyloid polypeptide. Further, while some of the genes found among the most abundant transcripts had previously been described as expressed in pancreatic islets, their relative abundance was unknown, such as secretory granule, neurendocrine protein 1 (7B2) [23] and chromogranin B [24]. Interestingly, 12 tags from the most abundant 50 transcripts could not be clearly identified through SAGEmap, as they were either “no match” or the tags mapped to several genes. These could represent tags not previously depicted for known genes, or tags for completely novel genes.

Amongst the transcripts remaining after subtraction and using the criteria defined in the Methods, we were able to determine that there were respectively 1090 and 777 genes considered as islet enriched in these two libraries, whereas the remaining genes are composed of either exocrine transcripts or genes common to both tissues. The complete list of all transcripts considered here to be specific is tabulated by abundance in the Supplemental Material (respectively 3S and 4S of the Supplementary Materials).

GO functions for the exocrine (HEX01) and merged islet libraries

For all identified transcripts in the exocrine (HEXO1) and merged islet libraries, all available gene ontology functions were gathered and regrouped into the major top-level classes defined for the molecular functions by the Gene Ontology consortium (http://www.geneontology.org/). Within these two libraries, 1179 (merged islet library—representing 416 772 tags per million) and 590 (HEXO1—representing 470 829 tags per million) transcripts had gene ontology functions. These molecular functions represent the various tasks carried out by the proteins. The full range of tasks represented in these libraries represents hundreds of possibilities that have been regrouped according to the 19 major top level classes to simplify this diagram (Fig. 2). The three major categories represented here for both libraries are “Binding activities”, representing several classes of mechanisms comprising ligands, protein binding, but also DNA binding activities, “Signal transducer activity” representing different members of the signal transduction pathways and “Catalytic activities”, comprising various enzymes and signal transducers (kinases, phosphatases...). The hierarchy of the molecular functions can be consulted at the gene ontology web-site (http://www.geneontology.org/).

Fig. 2
figure 2

Distribution of the molecular functions in the merged islet and the exocrine libraries. The molecular functions of all identified transcripts were collected for the merged islet library and the exocrine library. The different molecular functions were regrouped into 19 major top-level classes. For each function found for a transcript, the tag count for the gene was added to the corresponding class (some transcripts bear several functions in the same or in different classes). The cumulative tag counts for each class are represented here in tags per million in two concentric pie charts for the merged islet library (inner circle) and the exocrine library (outer circle)

Relationship of gene expression in the islets and chromosomal location of linkage peaks for Type 2 diabetes

Genome scans of families with multiple members affected with diabetes have identified chromosomal regions likely to harbour genes that contribute to disease risk. Chromosome 1q, 12q and 20q have previously been identified as carrying four regions most reproducibly shown to harbour Type 2 diabetes mellitus genes [25]. Islet genes identified through SAGE and mapped to these regions are presented in Tables 5, 6 and 7 and ranked according to their chromosomal locations. The genes associated with the highest linkage peaks have been highlighted.

Table 5 Islet transcripts expressed in the Type 2 diabetes mellitus regions on Chromosome 1q. The transcripts identified as being expressed in the islets are organized according to their chromosome locations for chromosome 1q. Regions thought to harbour Type 2 diabetes mellitus gene have been highlighted for the transcript located between 1q12 and 1q23.2
Table 6 Islet transcripts expressed in the Type 2 diabetes mellitus regions on Chromosome 12q. The transcripts identified as being expressed in the islets are organized according to their chromosome locations for chromosome 12q. Regions thought to harbour Type 2 diabetes mellitus gene have been highlighted for the transcript located between 12q13.12 and 12q15
Table 7 Islet transcripts expressed in the Type 2 diabetes mellitus regions on Chromosome 20q. The transcripts identified as being expressed in the islets are organized according to their chromosome locations for chromosome 20q. Regions thought to harbour Type 2 diabetes mellitus gene have been highlighted for the transcript located between 20q11.21 and 20q13.13

Discussion

This study uses SAGE analysis for human islet mRNA. From three human libraries, nearly 50 000 sequence tags were deposited in public databases (SAGEmap, http://www.ncbi.nlm.nih.gov/SAGE/). These data record the relative levels of expression of the most abundant human exocrine and endocrine pancreatic genes. Interestingly, this is the first exocrine SAGE library from normal tissue. A distinct advantage of submitting a tissue to SAGE analysis is that the digital readout of gene expression can be directly compared to those of several hundred libraries created from other human tissues such as brain, liver, skeletal muscle, etc. With software provided through SAGEmap, one can readily examine genes common to these tissues, as well as the relative level of enrichment of particular genes in specialized tissues by means of, for example, “virtual Northern blots”. An integrated analysis of the RNA abundance in the islets and chromosomal locations can also allow combining gene expression and linkage analysis for a disease. The study of candidate factors in chromosomal regions associated with Type 2 diabetes can be filtered according to their expression in the islets.

Over a fifth of the transcripts in ficoll-gradient purified islet libraries came from exocrine mRNAs (Table 4). After digital subtraction of exocrine transcripts from both islet libraries, 2180 genes expressed in islets were identified. The analysis of the libraries, after subtraction, depicted the relative expression level of the most abundant transcripts for three islet donors for the gradient purified library and one donor for the gradient purified and subsequent hand selected islet library. As gene expression varies from individual to individual, these variations can affect the relative “ranks” of the transcripts identified. To assess this issue, a linear regression analysis between the two subtracted islet libraries was performed that revealed a high correlation factor (R2=0. 9664) allowing us to merge the libraries providing values more representative in human adult islets as the results represent a composite from four donors.

The results of the SAGE analysis for islets shown in Table 4 are supported by the observation of well-recognized abundant islet gene products such as insulin, glucagon, islet amyloid polypeptide, and pancreatic polypeptide. These findings also concur with the findings of a large endocrine pancreas EST sequencing project [Endocrine Pancreas Consortium (EPCon)], http://www.cbil.upenn.edu/EPConDB/index.shtml. Human and mouse pancreas, islet, and insulinoma libraries were constructed and over 170 000 ESTs submitted to the public databases [26]. These ESTs were derived from a mixture of developmental stages of pancreas, not exclusively islet transcripts, and for these reasons cannot be compared directly to the two islet SAGE libraries presented here. Many of the most abundant genes in the current SAGE analysis such as insulin, transthyretin, glucagon, beta-2-microglobulin, carboxypeptidase-E or islet amyloid polypeptide, however, were also found to be amongst the most abundant transcripts within the EPCon libraries.

While SAGE analysis is unique in its ability to quantify gene expression in a given tissue, there are several limitations for the analysis of the data [6]. For example, SAGE generates tags from the most 3′-NlaIII restriction sites, but only on those mRNAs that have the site. In addition, tag to gene mapping is not completely definitive, as some tags correspond to several genes. SAGEmap provides two lists allowing the mapping of tags to genes. The most complete list describes all possibilities of tag to gene map and the second list provides a list of the most common and reliable tag to gene mapping. To clarify the analysis, we decided to use the more conservative second list. While more reliable, this limits the information obtained from this type of analysis since only about 60% of the tags from the three libraries could be identified with these conservative criteria. This number could be increased as new human ESTs are entered into the public databases. As a result of the large number of ESTs recently contributed by the Endocrine Pancreas Consortium (http://www.cbil.upenn.edu/EPConDB/index.shtml), several tags have been mapped that had no match to known genes in Unigene.

The difficulties in collecting healthy human tissues limited the quantity of islets and RNA that was obtained to build pancreatic islet SAGE libraries. Even though these libraries are large enough to provide valuable information about the most abundant transcripts in the pancreas, they were not large enough to analyze the relative expression levels of low abundance transcripts. Oligonucleotide arrays provide relative intensities of transcripts that can allow comparison of the expression level of one gene from individual to individual or under various conditions such as presence or absence of disease, metabolic conditions, or nutritional state for example. Unfortunately, the relative intensity of different transcripts within a tissue is difficult to interpret on an array because each gene has different hybridization characteristics. Thus, while it is far more difficult to create a SAGE library compared to performing a microarray experiment, SAGE analysis was necessary to provide us with relative abundance levels of transcripts in pancreatic islets. Regarding the precision of quantitative estimates of transcripts by SAGE, while more accurate for abundant transcripts, non-abundant transcripts with few tags yield only relative levels of expression. There are, however, no methods currently available that provide sufficient precision for expression levels of low abundance transcripts. Recently, Lynx Therapeutics (Lynx Therapeutics, Hayward, Calif., USA) has proposed a method called Massively Parallel Signature Sequencing (MPSS) measuring transcript abundance through a digital approach in which over a million transcripts can be counted simultaneously and which uses longer tags (17 bases) than in typical SAGE libraries. This new method could provide more extensive measurement of the expression of islet transcripts and provide information about low abundance transcripts as well [27].

The mechanical and enzymatic treatment required to isolate the different pancreatic fractions could induce expression of stress proteins not associated with normal tissue. This stress might be reflected in the current libraries through the very high expression level of several factors such as reg-1α and -1β, Hsp 70, and clusterin. Reg-1α is known to be expressed in both exocrine cells [16] and regenerating islets [15, 28, 29, 30]. The abundance of reg in pancreatic juice (10 to 14% of total protein) suggests that it plays an important role in exocrine pancreatic function. Its expression level is known to rise drastically in acute pancreatitis. While its expression was increased in regenerating pancreas, suggesting one of its names, its actual role as a proliferation factor for islets is currently unclear [31]. Even though clusterin has been described to be expressed in both exocrine [32] and endocrine cells [33] during cell injuries, its expression in our libraries seemed restricted to the islets.

A number of interesting observations were made from the merged islet libraries (Table 4). The expression of many secreted endocrine factors like insulin, glucagon, chromogranin B, islet amyloid polypeptide and pancreatic polypeptide were prominent. Intriguingly, the fourth most abundant islet transcript tag had no match in the SAGEmap databases and could not be identified with the current tag to gene mapping available through SAGEmap. As EST databases expand, additional tags adjacent to NlaIII sites close to poly-A tails will be identified and contributed to SAGEmap allowing identification of these unknown transcripts. An mRNA encoding a protein involved in prohormone processing and previously shown to be expressed in islets [19, 20], proprotein convertase subtilisin, also known as proSAAS, was an abundant islet transcript. Another abundant transcript represents secretory granule neurendocrine protein 1 or the 7B2 protein. Both proSAAS and 7B2 are proteins involved in hormone processing and are expressed in the brain as well as neurendocrine cells. ProSAAS is an inhibitor of prohormone convertase 1 activity [19], whereas 7B2 is a specific chaperone for proprotein convertase-2 that keeps the enzyme transiently inactive in vivo [34]. Mice homozygous for a null mutation in the 7B2 gene had no demonstrable PC2 activity and displayed hypoglycaemia, hyperproinsulinaemia, and hypoglucagonaemia [35].

Other abundant mRNAs identified in this study include recently described secretagogin, a cytoplasmic protein with six putative EF finger hand calcium-binding motifs [36, 37]. Its expression in pancreas is specific to the islets, and it is thought to be involved in KCl-stimulated calcium flux and the regulation of cell proliferation. The current study highlights the potential importance of this newly described islet protein, whose function has been little studied. Other less abundant identified islet transcripts included Protein tyrosine phosphatase, receptor-type N, also know as IA2 or islet cell antigen 512. It was discovered through the screening of a human islet library for clones encoding proteins reactive with sera from patients with Type 1 diabetes mellitus [38]. It was reported that 48% of Type 1 diabetes mellitus patients had antibodies directed to this islet antigen. The p57 (KIP2) protein is a genomically imprinted inhibitor of cyclin/Cdk complexes with an N-terminal CDK inhibitory domain highly similar to p21 (CIP1). Its implication in both sporadic cancers and Beckwith-Wiedemann syndrome makes it a tumor suppressor candidate. Since islet regeneration is a promising area of research, the abundance of this CDK inhibitor suggests that it could play an important role in this process. Interestingly this factor is located in an imprinted domain on chromosome 11p15 comprising IGF2 and H19, which seems to share the same tissue specific expression and imprinting pattern [39].

An analysis of the molecular functions represented in the islets showed that the most represented function corresponded to “binding activity” (Fig. 2). Approximately 35% of the molecular functions of the transcripts in the merged islet libraries are classified within this category. The factors contributing to this include insulin, transthyretin and glucagon for instance. The next most abundant islet function was “catalytic activities”, accounted for mostly by hormone processing enzymes, followed by the signal transducer activity (11% in the islets compared to 1.1% only for the exocrine library), suggesting the importance of responses to external signaling for islet function. Not surprisingly, the most common molecular function in the exocrine tissue (greater than 67%) corresponds to catalytic activities.

In summary, the results of these studies of human exocrine and endocrine pancreas libraries now provide in SAGEmap (http://www.ncbi.nlm.nih.gov/SAGE/) and the Endocrine Pancreas Consortium (http://www.cbil.upenn.edu/EPConDB/) transcript maps cataloging the relative levels of expression of the most abundant islet genes. This information should serve a number of useful functions, including the monitoring of relative abundance of transcripts during islet neogenesis, and the classification of altered patterns of gene expression during various stages of islet beta-cell failure in the development of diabetes. This data can also be analyzed in parallel with any kind of platform assessing RNA abundance through the RNA Abundance Database platform used on the Endocrine Pancreas Consortium web-site.

Additionally, the tedious job of sifting through hundreds of genes in the analysis of linkage peak regions in the analysis of the genetic basis for Type 2 diabetes could be facilitated by an addition to the currently used candidate gene approach, where one of the criteria for selection of a candidate would be relatively high level of expression in human pancreatic islets. Similar analyses for chromosomal regions identified to harbour Type 1 diabetes mellitus could be conducted as well.