The population of Malaysia comprises three major ethnic groups: Malays, Chinese and Indians. Endoscopy-based, sero-epidemiological and molecular studies conducted in the region reported that there are differences in the prevalence of Helicobacter pylori infection among different ethnic groups [14]. Further observations also revealed differences in the distribution of non-ulcer dyspepsia (NUD), peptic ulcer disease (PUD) and (pre-)cancerous lesions in H. pylori-positive patients. The Chinese consistently showed a higher rate of severe disease. Although Indians have a higher prevalence of H. pylori infection, the frequency of peptic ulcer disease among them was relatively low, a pattern that has not changed significantly since 1994 [1, 3]. Differences in the prevalence of H. pylori infection and gastro-duodenal diseases among patients of different ethnic groups residing in the same geographic region may provide insight into the pathogenesis of the infection.

The cagA gene encodes the cytotoxin-associated gene A (CagA) protein and is a marker for the cag pathogenicity island. The cagA gene is diverse in its structure, especially at the 3′-terminus. Many studies reported differences in this sequence among strains from East Asian and Western countries [47]. This revealed that there are several distinct forms of CagA with an uneven geographical distribution, which may mark differences in virulence among cagA-positive H. pylori strains. Our previous report showed that the overall incidence of cagA-positive H. pylori isolates was 94% and that there was no significant association between cagA subtypes and gastroduodenal diseases [8]. Genetic variation in the cagA 3′-terminal region among Malaysian H. pylori strains has never been explored. In the present study, we characterise the cagA 3′-terminus in H. pylori strains isolated from dyspeptic patients of different ethnicity and variable gastroduodenal diseases.

A total of 110 cagA-positive H. pylori isolates were derived from gastric antrum biopsies of patients attending the Endoscopy Unit of Universiti Kebangsaan Malaysia Medical Centre (Kuala Lumpur, Malaysia). The patients (19 Malays, 68 Chinese and 23 Indians) were classified into four groups according to endoscopic findings and histopathology: normal stomach (n = 2), gastritis (n = 52), duodenitis (n = 10), gastric ulcer (n = 15), duodenal ulcer (n = 10), gastric and duodenal ulcer (n = 3), intestinal metaplasia (n = 23), atrophy (n = 2), dysplasia (n = 1) and gastroesophageal reflux disease (GERD) (n = 1). Patients were classified as having peptic ulcer disease when active gastric and/or duodenal ulceration was detected upon endoscopy with a mucosal break of at least 0.5 mm in one dimension. Patients with past ulcers were also included because of potential relapses and remitting conditions.

Assessment of the histological grade of severity and the presence of H. pylori were routinely done at the Department of Pathology. The biopsies were fixed in 10% formalin and paraffin-embedded sections were cut and stained with haematoxylin-eosin. When necessary, sections were also stained according to Warthin-Starry for better visualisation of H. pylori. The slides were classified by an independent pathologist. Intestinal metaplasia, atrophy, dysplasia and gastric carcinoma were determined and grouped as (pre-)cancerous lesions.

Biopsies were sub-cultured for H. pylori on Columbia agar base (Oxoid) containing Dent’s supplement (Oxoid) and 7% ox blood. The plates were incubated at 37°C for five days under micro-aerophilic conditions. The bacterial cultures were confirmed to be H. pylori on the basis of colony morphology, the presence of curved or spiral-shaped Gram-negative bacteria, and positive urease, catalase and oxidase tests. A single colony was picked from each culture plate and sub-cultured for DNA extraction.

Genomic H. pylori DNA was prepared using the High Pure PCR Template Preparation Kit (Roche, Mannheim, Germany) according to the manufacturer’s instructions. Differences in the length of the 3′-terminal end of the cagA gene were identified using primers cag1 and cag3 as described previously [9]. The amplification products range from 500 to 800 base pairs (bp) in length. H. pylori ATCC 700824 (strain J99) and H. pylori ATCC 700392 (strain 26695) were used as positive controls. UreC target sequences provided the internal control. Polymerase chain reaction (PCR) products were analysed on ethidium bromide-stained agarose gels. The cagA PCR products were purified using the MinElute Gel Extraction Kit (QIAGEN, Hilden, Germany) and then cloned into pCR2.1 using TOPO TA cloning (Invitrogen, Carlsbad, CA). The cagA inserts were sequenced on both strands using M13 primers. All sequences were stored in BioEdit. Clustal W version 1.83 was used for multiple sequence alignment. Nucleotide and amino acid sequences were analysed by pair-wise sequence comparison using the Kimura distance method. The sequences used in the analyses comprised of three H. pylori genome sequences obtained from GenBank, strain 26695 (ATCC 700392), strain J99 (ATCC 700824), the Japanese strain (accession no. AB 246742), and 110 sequences obtained from our H. pylori clinical isolates.

Statistical analysis was performed using the Statistical Package for the Social Sciences (SPSS). Differences in the distribution of genotypes of the cagA gene and the association between genotypes, clinical outcome and patients’ ethnicity were analysed using χ2 tests and Fisher’s exact testing. A P-value less than 0.05 was considered to be significant.

Amino acid sequences of the C-terminus of CagA revealed six genotypes that differed in their structural organisation due to variations in the numbers and/or types of different repeats (Fig. 1) [9]. The CagA types were assigned to A-B-D, A-B-C, A-B-B-D, A-B-C-C, A-B and A-C based on the EPIYA sequence, as previously described [10]. The sequences of repeat regions R1, R2, R3, R3′, R4 and R4′ are shown in Table 1. R1, R2 and R3 are similar to sequences as previously described [9]. The R1 and R2 regions are conserved and are found to be similar between Eastern and Western type sequences. R3 and R4 are similar to the Japanese strain sequence, whereas R3′ and R4′ sequences are more similar to the Western strain sequence. The data showed that the R1 region, which is composed of the EPIYA motif, was present in all of the strains. Copies of R1 were found at various frequencies in between the R2, R3, R3′, R4 and R4′ repeat regions. The majority of CagA sequences had at least three EPIYA motifs, as observed in CagA types A-B-D and A-B-C. Some variants contained two repeats (types A-B and A-C) or four repeats (types A-B-B-D and A-B-C-C).

Fig. 1
figure 1

Primary structures of the 3′ region of the cagA gene in clinical Helicobacter pylori isolates. The fragments are not represented on a proportional scale

Table 1 Amino acid sequences in the repeat region of CagA

Similarity score analysis showed that CagA type A-B-D is 80% similar to the Japanese strain and 60% to the Western strain. CagA type A-B-D was found in 70 (75.3%) strains and CagA type A-B-C in 23 (24.7%) strains. One strain had CagA type A-B-B-D and looked similar to the Japanese strain, whereas 13 strains had CagA type A-B-C-C. The similarity of CagA type A-B-C-C dropped to about 70% when compared to the Japanese strain, but had a high similarity score to strain 26695 (89%). CagA type A-B (two strains) and A-C (one strain) showed a similarity of about 76% to strain 26695.

Table 2 shows the distribution of CagA type A-B-D and CagA type A-B-C among patients from different ethnic groups and with variant diseases. A large proportion of Chinese patients were infected with H. pylori strains carrying CagA type A-B-D, whereas infection with H. pylori strains carrying CagA type A-B-C were predominantly detected in Indians and Malays. Differences in the infecting sub-genotypes of H. pylori CagA types A-B-D and A-B-C between the Chinese and non-Chinese (Malays and Indians) were highly statistically significant (P < 0.0005). However, there was no association between the CagA type A-B-D and the severity of disease, even though the isolation of these sub-genotypes from predominantly infected Chinese patients was significant.

Table 2 Distribution of the subtypes of CagA types A-B-D and A-B-C among patients from different ethnic groups and patients with different disease groups

We found no association between the CagA type A-B-D and the severity of disease, even though this type was predominantly isolated from Chinese patients. However, our data suggest a trend, albeit not statistically significant, indicating that a high proportion of patients from the precancerous and cancerous groups were infected with CagA type A-B-D—an observation that merits further study.

The first and second EPIYA motifs (in the repeat region R1 and R2) that are designated as EPIYA-A and EPIYA-B, respectively [7], are present in almost all CagA proteins, whereas the remaining EPIYA-C motifs were duplicates of an EPIYA containing 34 amino acids. Since the 34 amino acid sequence shows variation in copy numbers, ranging from one to three in most Western CagA proteins, it is designated as the Western CagA-specific (WSS) [10, 11]. In contrast, the amino acid sequence of East Asian CagA is quite different from that of Western CagA. Predominantly, East Asian CagA proteins do not have the WSS but, instead, possess a distinct sequence in the corresponding region, which is designated East Asian CagA-specific (ESS) [10]. ESS possesses an EPIYA motif designated as EPIYA-D.

In our study, CagA type A-B-D demonstrated a single ESS, whereas CagA type A-B-C demonstrated a single WSS. All of CagA type A-B-C-C had two WSS regions and is, thus, classified as A-B-C-C, and in a single strain (HQ326), A-B-B-D region was detected. Sequence analysis of CagA type A-B-C-C showed that the strains were similar to Western CagA type, and of which the presence of two EPIYA-C may contribute to the high phosphorylation activity and more intense cellular rearrangement. Backert et al. [12] showed that the SASEPIY sequence motif and, particularly, Y-972 is of general importance for CagA phosphorylation and proved that the phosphorylation of CagA at Y-972 has a function in actin-based cytoskeletal rearrangements. These polymorphisms within CagA might affect the biological function of the protein and might explain the lack of a consistent correlation between CagA and disease severity, as noted in the present study and others [13, 14].

We examined the CagA diversity among clinical isolates from patients of Malay, Chinese or Indian ethnicity and observed a mixed presence of H. pylori that had East Asian or Western type CagA, although the East Asian strain appeared to be predominately isolated from the Chinese patients. The prevalence of East Asian CagA and Western CagA strains among our local isolates was 64.5 and 36.5%, respectively, which were similar to Thailand, but quite different from Japan and China. All of the studied samples isolated from Fukui, Japan, and 94.4% from Hangzhou, China, had East Asian type CagA [15, 16]. Data from Thailand reported a prevalence of 53.7% of East Asia CagA and 26.8% of Western CagA strains in their population [17, 18]. It has been reported that large sequence differences distinguish the CagA function of Asian strains from other strains [11, 19]. The present data showed that diversity of CagA occurs even in the same population in Malaysia, where most of the isolates from Chinese patients carried East Asian CagA type and most of the isolates from Indians and Malays carried the Western CagA type. An important unanswered question is whether H. pylori strains having a particular CagA structure could be linked to a defined gastroduodenal disease. Yamaoka et al. [9] have reported that H. pylori possessing more than three repeats (102 bp) in the 3′-terminal region of the cagA gene are associated with enhanced injury and reduced survival under acidic conditions. In our study, patients with peptic ulcer disease or (pre-)cancerous lesions were more likely to be infected with East Asian CagA H. pylori strains, although the association was shown to be not significant. Occhialini et al. [20], studying 33 H. pylori isolates from Costa Rica, have found no association between the number of repeated sequences at the 3′ region of the cagA gene or the presence of tyrosine phosphorylation motifs or the clinical origin of these strains, and, furthermore, they did not find strains with three repeats. Further studies are required to assess whether these discrepancies are due to the variable region of cagA or if, in fact, they correspond to differences in the immune response within the infected population.

Strain-to-strain variation in the CagA sequence, particularly in the variable, highly hydrophilic and surface-exposed C-terminal region, could be a valuable tool for the bacteria to escape the immune clearance of the host. This capability could have arisen by horizontal gene transfer of the pathogenicity island, where the cagA gene is located and optimised by homologous recombination events among DNA from different H. pylori strains.