Journal of Molecular Medicine

, Volume 82, Issue 4, pp 214–222

Human genome research in China


    • State Key Laboratory of Medical Molecular BiologyChinese Academy of Medical Sciences and Peking Union Medical College

DOI: 10.1007/s00109-003-0515-y

Cite this article as:
Qiang, B. J Mol Med (2004) 82: 214. doi:10.1007/s00109-003-0515-y


Significant progress in human genome research has been made in China since 1994. This review aims to give a brief and incomplete introduction to the major research institutions and their achievements in human genome sequencing and functional genomics in medicine, with emphasis on the “1% Sequencing Project”, the generation of single nucleotide polymorphism and haplotype maps of the human genome, disease gene identification, and the molecular characterization of leukemia and other diseases. Chinese efforts towards the sequencing of pathogenic microbial genomes and of the rice (Oryza sativa ssp. Indica) genome are also described.


Human genomeMicrobial genomeRice genomeHuman disease related genes


The aim of human genomics research is to decode the genetic information held by the human genome, and to decipher the structure and function of all human genes. The field of genome research, which has expanded from humans to other organisms, is flag-shipped by the Human Genome Project (HGP). The HGP was initiated in United States at the beginning of the 1990s, and has become an international collaborative project joined by the United Kingdom, Japan, France, Germany, and China. The completion of the sequencing of the human genome was announced on 14 April 2003 by the heads of the six member states with an historic proclamation indicating that the goals of the HGP had been accomplished. Though a latecomer, China has contributed much to the field of human genome research through international collaborations and strong government support. This review summarizes recent progress in genome research achieved by major research institutions in China.

The Human Genome Project in China

The first official national-state project on the human genome, headed by Drs Zhu Chen and Boqin Qiang as coordinators and sponsored by the National Natural Science Foundation of China, was officially announced in 1994. Due to limited resources at that time, the goals of this first project were to focus on the establishment of the infrastructure and core technology, collection and storage of genetic samples, as well as the training of research professionals. This pioneer project marked the beginning of human genome research in China and laid the foundation for further projects which were subsequently supported by the central government through the National High-Tech Development Program (“863 Programs”), the National Developmental Program of Key Basic Research (“973 Programs”), and other research programs by the Ministry of Science and Technology (MOST), the Chinese Academy of Sciences (CAS), as well as by the Beijing, Shanghai, and other local governments. The whole governmental budget for genome research and other relevant projects, mainly devoted to research running costs, is about 200 million US dollars for the years 2001–2005. Dozens of research centers and laboratories have been established nation-wide (Table 1). In addition to the central and local governments, state-owned enterprises and domestic venture capital, joined lately by private companies and foreign investors, have also begun to invest in research and the development of genome-related technology and products. Two areas have been emphasized in human genome research in China: large-scale sequencing, genome diversity and proteomics; and the functional study of disease-related genes and other genes of scientific and economic significance.
Table 1

The major laboratories involved in human genome research in China




Research focus

Beijing Genomics Institute (BGI)


Huanming Yang, Jun Yu, Jian Wang

Genomics and proteomics in humans, plants, animals and microbes

Hangzhou Genomics Institute (HGI, sister Institute of BGI)


Chinese National Human Genome Center at Shanghai (CHGCS, South China Human Genome Center)


Zhu Chen

Human genomics (primarily medical genomics), and microbial genomics

Chinese National Human Genome Center at Beijing (CHGB, North China Human Genome Center)


Boqin Qiang, Yan Shen

Human genomics (primarily medical genomics), and microbial genomics

Pathogenic Microbial Genome Research Center, Ministry of Health and Beijing Microbial Genome Center


Qi Jin, Yunde Hou

Sequencing and functional studies of the microbial genome

The Research Center for Human Disease Genomics, Peking University


Dalong Ma

Functional studies of human disease-related genes

State Key Laboratory of Molecular Oncology, Chinese Academy of Medical Sciences


Min Wu, Qimin Zhan

Genomics and molecular oncology

State Key Laboratory of Medical Molecular Biology, Chinese Academy of Medical Sciences


Linfang Wang, Depei Liu

Structural and functional studies of human genes and proteins

State Key Laboratory of Human Genome Research, and Shanghai Institute of Hematology (SIH), Shanghai Second Medical University


Zhu Chen

Medical genomics, and molecular mechanisms of leukemia

State Key Laboratory of Oncogenes and Related Genes, Shanghai Institute of Oncology


Jianren Gu, Shengli Yang

Genomics, molecular oncology

State Key Laboratory of Genetic Engineering, Fudan University


Long Yu

Human genetics and genomics

Shanghai Institute for Biological Sciences, Chinese Academy of Sciences


Gang Pei

Life sciences, genomics and proteomics

Bio-X Life Science Research Center, Shanghai Jiao Tong University


Lin He

Human genomics and genetics

State Key Laboratory of Medical Genetics, Zhong Nan University

Changsha, Hu Nan Province

HanXiang Deng, Jiahui Xia

Human genomics and medical genetics

The websites of the BGI:; the CHGB:; and the CHGCS:

Human genome sequencing and studies of genome diversity

China’s application to join the HGP was officially accepted at the Fifth International Strategy Meeting on Human Genome Sequencing, which was held on 1 September 1999 in Hinxton, UK. China assumed approximately1% of the whole human genome sequencing effort, thus generally named the “1% Sequencing Project”. Under the supervision of a seven-member executive group headed by Dr. Huanming Yang, and through the joint efforts of the Beijing Genomics Institute (formerly the Human Genome Center, Institute of Genetics, CAS), the Chinese National Human Genome Center at Shanghai (CHGCS, or the South China National Human Genome Research Center) and the Chinese National Human Genome Center, Beijing (CHGB, or the North China National Human Genome Research Center) the mission working draft was finished on schedule at the end of May 2000. China generated 653,000 reads, 131% of the originally assigned task of 500,000 reads, with a total of non-abundant sequences of 27.5 Mb. The Chinese contribution to the working draft of the human genome is summarized in [1]. China announced the completion of the assigned region on 26 August 2001, a result of the productive collaboration of these three major centers.

The International Human Genome HapMap (Haplotype Map), an international collaborative project second to the HGP aiming at linking differences in the human genome to differences in the susceptibility to human diseases, was officially launched jointly by the United States, United Kingdom, Japan, China, and Canada on 29 October 2002. The three centers mentioned above, in collaboration with other laboratories in Hong Kong and Taiwan, are responsible for around 10% of the whole effort.

Medical genome research

Genetic resource collection

Emphasis has been laid on the collection and storage of human genetic materials. An ELSI (Ethical, Legal and Social Issues) Committee was established. Since the very beginning of human genome research in China in 1994, Dr. Rufu Du, professor of the Institute of Genetics, CAS, was the first chairperson. Each National Human Genome Center has an ELSI committee. Dr. Renbiao Chen, genetics professor at the Shanghai Second Medical University, is the chairman of the CHGCS ELSI committee. Dr. Renzong Qiu, professor of the Chinese Academy of Society Sciences, held a post as the chairman of the CHGB ELSI committee. The ELSI committee not only supervises the ELSI in the collection of genetic resources, but also participates in many related national or international activities. Training courses were given to the technical personnel participating in the investigation and collection. Principles, procedures and standards were established for the bioethical issues. The network and database for registration, collection and data management were established. Accordingly, blood and tissue samples were collected, providing the resources for the study of genome diversity and studies of disease-related genes.

Identification of genes related to hereditary diseases

In recent years, great progress has been made in China towards identifying monogenic disease genes. The first disease-related gene isolated by Chinese geneticists was the human gap junction protein β-3 (GJB3) by Dr. J.H. Xia’s group (Table 2). Further mutation analysis revealed a missense mutation and a nonsense mutation of GJB3 associated with high-frequency hearing loss in two families [2]. In 2001, two groups, Dr. Y. Shen’s group at CHGB and Dr. X.Y. Kong’s group from the Shanghai Research Center of Biotechnology, Shanghai Institute for Biological Sciences, CAS, subsequently isolated the genes related to dentinogenesis imperfecta Shields type II (DGI-II). Shen et al. identified a nonsense mutation (Gln45→stop) in exon 3 of the dentine sialophosphoprotein (DSPP) gene in a Chinese family with dentinogenesis imperfecta Shields type II (DGI-II) [3]. Kong et al. identified mutations of the DSPP gene in three Chinese families: a donor site mutation (G→T) in intron 3 of DSPP in family I, a missense mutation (C49→A) in exon 2 of DSPP in family II, and a missense mutation (G52→T) in exon 3 of DSPP in family 3. Two of these families also have progressive hearing loss [4].
Table 2

Identification of genes related to hereditary diseases in China






High-frequency hearing loss

Gap junction protein β-3 (GJB3)

Family I: Gln183→Lys (G547→A)



Family II: Arg180→stop (C538→T)

Dentinogenesis imperfecta Shields type 2 (DGI-II)

Dentine sialopho-sphoprotein (DSPP)

Gln450→stop (intron 3)



Dentinogenesis imperfecta type 1 (DGI-I) with or without progressive hearing loss

Family I: donor splice site (GT) in intron 3, G→A



Family II: Pro17→The (C49→A)

Family III: Phe18→Val (G52→T)

Brachydactyly type A-1

Indian hedgehog (IHH)

Family I: Glu95→Lys (G283→A)


[5, 6]

Family II: Glu131→Lys (G391→A)

Family III: Asp100→Glu (C300→A)

Autosomal dominant Lamellar and Marner cataract

Heat-shock transcription factor 4 (HSF4)

Family I: Leu115→Pro (T384→C)



Danish family: Leu115→Pro (C362→T)

Individual 1: Ala20→Asp

Individual 2: Ile87→Val

Familial atrial fibrillation (AF)

KCNQ1 (potassium channel subunit)

Ser140→Gly (A418→G)



Rett syndrome

X-linked methyl-CpG-binding protein


Twelve different mutations in exon 3 were identified in 17 of 31 patients, with two novel mutations


[8, 9]

Childhood absence epilepsy (CAE)

CACNA1H (T-type calcium channel alpha 1H subunit)

12 missense mutations were identified in 118 patients



Agenesis of the permanent teeth (He-Zhao deficiency) OMIM 604625


[12, 13]

Disseminated superficial actinic porokeratosis



Brachydactyly type A-1 (BDA-1, MIM 112500) was first identified by Farabee in 1903. It is the first recorded example of a human anomaly with Mendelian autosomal dominant inheritance. Dr. L. He’s group (Table 1) has cloned the IHH (Indian hedgehog) gene based on the identification of the locus for brachydactyly type A-1 as 2q35–36. Different heterozygous missense mutations (G238→A, G391→A, and C300→A) have been identified in the region encoding the N-terminal signaling domain of the IHH gene in all patients in three large unrelated families [5, 6]. Table 2 shows several results of identifying gene mutations in Chinese pedigrees: autosomal dominant Lamellar and Marner cataracts [7], Rett syndrome [8, 9], and childhood absence epilepsy [10].

Recently, a type of familial atrial fibrillation (AF) was mapped to the 11p15.5 region and a causative mutation (Ser140→Gly) of the disease-related gene, KCNQ1, was identified by whole genome scanning, single nucleotide polymorphism (SNP) analysis and haplotyping, as well as association analysis of genotype and phenotype by Dr. W. Huang’s group in CHGCS and their collaborators. The KCNQ1 gene encodes the pore-forming subunit of the cardiac Iks channel (KCNQ1/KCNE1), the KCNQ1/KCNE2 and the KCNQ1/KCNE3 potassium channels. The S140→G mutation is likely to initiate and maintain AF by reducing action potential duration and the effective refractory period in atrial myocytes [11].

In addition, several hereditary disease-related genes have been localized within the human genome. He-Zhao deficiency has been recently characterized, with a distinct form of agenesis of the permanent teeth that is different from other previously reported disorders of dentition. Using a DNA pooling method combined with two-point and multi-point linkage analysis, Dr. L. He’s group has mapped the gene locus for He-Zhao deficiency to chromosome 10q11.2 [12, 13]. The gene locus of disseminated superficial actinic porokeratosis (DSAP2) was mapped to chromosome 15q25.1–26.1 [14]. Furthermore, databases for monogenic disease mapping and cloning, pedigrees of nervous system diseases, and banks for pathologic tissues and organs have been established.

Genes related to cancer development


Significant contributions have been made to the understanding of the molecular basis of leukemia and the mechanisms of differentiation and apoptosis as well as therapies for acute promyelocytic leukemia (APL) by the Shanghai Institute of Hematology, headed by Dr. Zhu Chen. In most patients APL [15, 16] is characterized by the accumulation of promyelocytes containing a specific chromosomal translocation t(15;17) and exhibiting unique sensitivity to all-trans-retinoic acid (ATRA) [17]. Following previous studies on the molecular basis of gene rearrangement after chromosomal translocation at locus t(15;17) [17, 18], their major contributions to the analysis of gene structure and function over the last 10 years include:
  1. 1.

    Characterization of important rearrangements in the RARα gene on chromosome 17 and the MYL (later named PML) gene on chromosome 15, studies of the chimeric PML-RARα and RARα-PML fusion genes [19], and sequencing of the entire genomic DNA region of the PML and RARαgenes [20, 21].

  2. 2.

    Discovery of an unusual karyotype 46,XY t(11;17)(q23; q21) in APL [22], analysis of the PLZF-RARα fusion gene in the truncated locus of this chromosome, and the cloning and analysis of the newly discovered PLZF (Promyelocytic Leukemia Zinc Finger) gene. PLZF encodes a potential transcription factor related to the Drosophila gap gene Kruppel and is expressed in at least two isoforms [23]. The entire genomic region of PLZF was sequenced [24]. The PLZF-RARα protein inhibits ligand-dependent transactivation of RARα, reflecting the “dominant negative” effect of PLZF-RARα on wild-type RARα [25].

  3. 3.

    Establishment of PLZF-RARαand NPM-RARα transgenic mouse models [26]. PLZF-RARα transgenic animals developed chronic myeloid leukemia (CML)-like disease, whereas NPM-RARα transgenic mice showed a spectrum of phenotypes ranging from typical APL to CML.

  4. 4.

    Analysis of gene expression profiling in the APL cell line NB4 before and after ATRA treatment to elucidate the molecular mechanisms of ATRA-induced differentiation of APL cells [27]. More recently, a cDNA microarray with 13,014 clones was screened to further explore these gene expression networks. The results of this screen showed that 318 genes were up-regulated and 291 were down-regulated by ATRA in NB4 cells. In the early stage of ATRA treatment, genes involved in inhibition of proliferation, cell cycle arrest and apoptosis antagonists were up-regulated, whereas with the onset of differentiation, genes related to granulocyte maturation and apoptosis agonists were up-regulated (unpublished data).

  5. 5.

    The NUP98-PMX1 fusion gene was cloned from a CML [28, 29] patient with AML-M2 transformation and bearing a secondary chromosomal translocation t(1;11)(q23;p15), and the observation that NUP98-PMX1 transgenic mice developed myeloid dysplastic syndrome (MDS) or myeloid leukemia was established. The fact that NUP98-PMX1 formed complexes only with histone deacetylase (HDAC) 1 in vivo and that its inhibitory effect on FOS could be abolished upon treatment with HDAC inhibitors strongly suggest that the NUP98-PMX1 chimera functions, at least for some genes, as a transcriptional inhibitor (unpublished data). NUP98/HOX11 [30] and MLL-EEN fusion genes found in acute leukemias with t(11;12) (p15;q13) and t(11;19)(q23;p13) were also cloned and the functions of those fusion genes are under investigation.


Nasopharyngeal carcinoma

Nasopharyngeal carcinoma (NPC) occurs with high frequency in the southern part of China, especially among people of Cantonese ancestry. No predisposing genes have been identified up to now, though the HLA-BW46 locus is associated with increased risk of NPC [31]. Dr. Y.X. Zeng’s group [32] instigated the genome-wide scanning of 20 families with a high risk of NPC (2–9 affected members per family) from Guangdong Province. Fifty-four affected individuals were genotyped using 382 polymorphic microsatellite markers, covering 22 autosomes with an average marker density of 10 cM. Parametric analysis provided evidence of linkage to the D4S405 marker on chromosome 4 with a logarithm of odds for linkage (lod) score of 3.06 and a heterogeneity-adjusted lod (hlod) score of 3.21. Fine mapping with additional markers flanking D4S405 resulted in a lod score of 3.45 and hlod score of 3.67 for the region 4p15.1-q12. This region has been recently narrowed down to 8.29 cM by means of further analysis of the pedigrees with simple tandem repeat and SNP markers (unpublished data). These results indicate that a susceptibility locus on 4p15.1-q12 may account for a significant subset of hereditary NPC which provides a solid basis for future studies to identify the NPC susceptibility gene. However, the above genome-wide scan results did not indicate any obvious linkage to chromosome 6 which contains the MHC locus, even though previous reports have suggested an association between the risk of NPC and certain MHC haplotypes [31]. A possible explanation for this discrepancy may lie in the different subjects studied: previous work was conducted with affected sib-pair families, whereas high-risk NPC pedigrees were used in this study.

Hepatocellular carcinoma

To identify genetic abnormalities in human primary hepatocellular carcinoma (HCC), microsatellite analysis was performed on 60 Chinese HCC specimens versus non-cancerous liver tissues as controls. To further evaluate the nature of the allelic loss, comparative genomic hybridization was accomplished in 20 pairs of the above HCC samples. The combined analysis of these two methods revealed frequent allelic loss on 17p, 9p21-p23, 16q21-q23.3, 13q, 8p21-p23 and 6q24-q27, whereas the most frequent allelic gain occurred at 1q, 17q and 8q24. The highest incidence of allele loss was 17p13,3 (65%) [33]. Twenty-two paired HCC and non-cancerous liver samples were then analyzed with 14 polymorphic markers. The data revealed a high level of loss of heterozygosity (>68%) in a minimum region between D17S1866 and D17S1574, spanning over a 1.5 Mb region. A physical map was created based on large-scale sequencing of relevant cluster bacterial artificial chromosome/artificial chromosome clones. Seventeen known genes and 13 novel genes were identified in this minimum region, and the functions of these genes were characterized [34, 35, 36].

Recently, a comprehensive characterization of gene expression profiles of positive hepatitis B virus infected HCC was initiated through the generation of a large set of 5′ expression sequence tag (EST) clusters (11,065 ESTs) from HCC and non-cancerous liver tissue samples. These ESTs were then applied to a cDNA microarray system containing 12,393 genes/ESTs, and a commercial cDNA array of 1,176 known genes as target sequence sources. The integrated data from this study identified 2,253 genes/ESTs as candidates showing differential expression. A number of these differentially expressed genes were verified by RT-PCR, which revealed that many were involved in cell cycle regulation, such as the cyclin-dependent kinases, cell cycle negative regulators and metabolic regulators. Some of these candidate genes may be related to cancer cell differentiation. Also, the altered transcriptome profile of HCC may arise from a number of chromosome regions exhibiting loss of heterozygosity or amplification [37].

Esophageal squamous cell carcinoma

Esophageal squamous cell carcinoma is one of the most common malignancies in the world, and has ranked as the fourth cause of cancer death in China since the beginning of the 1980s. Dr. M. Wu’s group performed comparative genomic hybridization to detect copy number changes in DNA. The most common gains were observed at 3q, 8q, 1q, 20q, 20p, 5p, 15q, and 9q, and losses at 3p, 13q, 18q, 9p, 4, Xp and others [38]. Studies of loss of heterozygosity (LOH) were conducted using microsatellite markers. Four minimal deletion regions of overlap were found at 3p14.2, 3p26, 13q12.3 and 13q14.1-q14.3 [39]. This group has also studied the expression patterns of a dozen genes such as TGase 3, Mal and RH50 in esophageal carcinomas and has identified their association with human cancers for the first time. The full-length cDNA and genomic DNAs of two novel genes (DRC1 and DRC2) were characterized, and their chromosome location and tissue-specific distribution of expression were elucidated [40, 41, 42, 43]. They have found that the expression of NMES1 and gut-enriched Kruppel-like factor were down-regulated [44, 45], while EC45 was overexpressed [46] in esophageal cancer. They also showed that down-regulation of the three annexin I isoforms, identified by proteomics, was a frequent event in esophageal carcinogenesis [47], and that the alteration of p63 might play a significant role in the early steps in tumorigenesis [48].

Genes associated with other common diseases

Type 2 diabetes

SNP screening and case-control associated analysis were carried out for 37 candidate genes located in a 4.3 Mb region of the p terminus of chromosome 1 [49]. Several genes were identified as susceptibility genes for type 2 diabetes in the northern Chinese population. These included the CDC2L2 (cell division cycle 2-like kinase 2), PGD ( phosphogluconate dehydrogenase ), caspase 9, PRKCZ (protein kinase Cξ), SAC ( soluble adenylate cyclase ), and UTSH ( urotensin II) genes [50, 51]. Dr. M. Luo’s group at the Shanghai Institute of Endocrinology, in collaboration with researchers at the CHGCS, carried out SNP analysis for the candidate genes involved in insulin signal transformation, lipometabolism and energy synthesis. The total length of these 11 candidate genes is 57,782 bp, with 87 SNPs, of which 54 occur at high frequency, and 33 at low frequency and with an uneven distribution.

Essential hypertension

Essential hypertension (EH) is a common late-onset disease that exhibits complex genetic heterogeneity. Several genome-wide scans recently accomplished in ethnic Chinese populations revealed a number of candidate loci possibly contributing to EH, and appeared to be replicable in 2q14-q23 and 5q32 [52, 53]. D.L. Zhu et al. reported linkage to the chromosome 2q14-q23 region for EH in a south China population [54, 55]. Several genes located in that region are considered to be relevant to the regulation of blood pressure and the development of EH. These genes include several G proteins and G protein-coupled receptors, a voltage-gated sodium channel protein and the gamma subunit of the sodium/potassium ATPase [56]. Another region, 5q32, harboring the β2-adrenergic receptor (β2AR) gene, has been reported to be linked to EH in a Taiwanese population [57]. To repeat these results and perform quantitative linkage analysis, D.F. Gu et al. genotyped members of 148 hypertension families containing 328 affected sib pairs, and grouped families from Beijing and Jiangsu province with five highly informative microsatellite markers (D2S151, D2S142, D5S2090, D5S413 and D5S2013), but the results provided no evidence in support of significant linkage of 2q14-q23 or 5q32 with EH [58]. Any explanation for the above results will be complicated due to the number of factors involved. At best, these observations indicate the diversity in the etiology and complexity of hypertension.

Another report recently published by Gu’s group showed that a region of chromosome 8 flanking the LPL gene might contribute to the individual blood pressure variations in Chinese. This study involved linkage analysis in 148 Chinese hypertensive families [59, 60]. Using the linkage model in SOLAR, a region of linkage with systolic blood pressure to a 10.6 cM on chromosome 8 (8p22) was identified by markers D8S1145, D8S261and D8S282 with a maximum two-point LOD score of 2.52 at D8S261 and a maximum multipoint LOD score of 2.03 near the marker D8S261. In the qualitative trait linkage analysis, evidence for linkage between the marker D8S1145 and EH was found (P=0.029); TDT/S-TDT also supported significant linkage disequilibrium with EH at allele 3 of D8S2612=8.643, P=0.01). The results are consistent with W.H. Pan et al.’s report in 2000 [57].

Gene expression profiling and full-length cDNA cloning

More than 150,000 ESTs have been cloned and sequenced from human blood stem and progenitor cells [61], dendritic cells [62], the hypothalamus-pituitary-adrenal system [63], the cardiovascular system, fetal liver [64], fetal brain, testis, and other tissues and organs. Furthermore, thousands of full-length cDNAs of genes related to development, differentiation, and signal transduction have been cloned from cDNA libraries constructed from the above cells or tissues.

In a work reported by Mao et al. [61] genes expressed in human umbilical cord blood CD34+ cells were catalogued by partially sequencing a large number of EST clones and analyzing these sequences with bioinformatics tools. Three hundred cDNAs containing putative entire open reading frames for previously undefined genes were obtained; based on EST cataloging, clone sequencing, in silico cloning, and rapid amplification of cDNA ends [65]. Six novel Kruppel-like zinc finger genes from hematopoietic cells were cloned, and a novel transregulatory domain KRNB was identified [66]. Distinct gene expression patterns were found between CD34+ cells from normal bone marrow and AML-M5 transformed by myeloid dysplasia syndrome [67].

Microbial genome research

Research on microbial genomes is one of the hot topics in life science worldwide. The first microbe genome sequence contributed by China to the public domains of the international databases was that of Shigella flexneri (Serotype 2A), the most prevalent species and the serotype that causes bacillary dysentery or shigellosis in man. This sequencing project was completed by the Pathogenic Microbial Genome Center of Ministry of Public Health, and the CHGB. The genome size of S. flexneri (Serotype 2A) is 4,607,203 bp, with an extra virulence plasmid of 221,618 bp [68]. Its genome has, astonishingly, 314 IS elements and hundreds of pseudogenes. Genes encoding for Shigella outer membrane proteins have potential for vaccine development, and toxic factor genes with potential as targets for drug development have been cloned and characterized.

The complete genome sequence of a representative virulent serovar type strain (Lai) of Leptospia interrogan serogroup Icterochaemorrhagiae has been recently published [69]. The whole genome consists of a 4,332,241 bp large circular chromosome and a 358,943 bp small chromosome with a total of 4,768 predicted genes. A comprehensive analysis of the Leptospia interrogans genes for chemotaxis/motility and lipopolysaccharide synthesis provides a basis for in-depth studies of virulence and pathogenesis.

The genome sequences of other microbial species including Staphylococcus epidermidis, Xanthomonas campestri and so on are in progress. In addition, the genome project for Schistosoma japonicaum is also underway.

The first extremophile microbe sequenced by China was T. tengcongensis, which is a thermophile bacteria isolated from Tengchong, Yunnan. The complete sequence was published by the Beijing Genomics Institute and its collaborators [70].

The world Health Organization officially announced that a variant coronavirus is the pathogen responsible for the severe acute respiratory syndrome (SARS) which broke out in China in spring this year. Five isolates from SARS patients identified in Guangdong and Beijing were sequenced and deposited into GenBank by Chinese scientists [71].

Rice and other important organisms in China

With the financial support from MOST the Shanghai Biochemistry Institute, CAS, established a National Sequencing Center headed by Dr. Guofan Hong in the 1990’s. The complete sequence map of chromosome 4 of Oryza sativa ssp. japonica was finished in that center in 2002 under the supervision of Dr. Bing Han, and was one of the two rice chromosomes to be completely sequenced [72]. The finished sequence spans 34.6 Mb and represents 97.3% of the chromosome. In addition, it has the longest known sequence for a plant centromere, a completely sequenced contig of 1.16 Mb corresponding to the centromeric region of chromosome 4. A total of 4,658 genes predicted from these sequences match the available unique rice ESTs.

A draft sequence of the rice (Oryza sativa ssp. Indica) genome was published [73] by Beijing Genomics Institute and its sister center in Hangzhou (Hangzhou Genomics Institute) with their large scale sequencing facilities and technology established through participation in the HGP. The completion of the whole rice genome fine map was announced in October 2002. All the rice sequence data have been made freely available as a Chinese contribution to the world under the banner of “owned by all, done by all, and shared by all”.

The pig is an important domestic animal of both medical and economic significance. A pig genome survey has been accomplished by Beijing Genomics Institute in collaboration with Royal Agricultural and Veterinary University, National Institute of Agriculture, and Aarhus Univerisity, Denmark, with financial support from the Danish Pig Production Committee and CAS. cDNAs from approximately 100 different tissues or different developmental stages have also been sequenced. In addition, chicken, silkworm and other genomes are also in the process of being sequenced by Beijing Genomics Institute, and the South and North China National Human Genome Research Centers.


Over the past ten years, Chinese scientists have made substantial contributions to genome research, especially with the accomplishment of 1% of the effort for the Human Genome Project and in the identification of genes related to hereditary diseases. China is a country with rich resources in terms of population size and biodiversity, so the major challenge for us is how to utilize our unique genetic resources for identifying other genes of biological significance. We are hoping to make further contributions to functional genomic research in collaboration with scientists in other countries.

As one of the chief coordinators of the genome projects in China, I am proud to note that genome research has attracted so many young, and established, Chinese scientists who have been well trained either in the West or in China. They have proved themselves by their enthusiasm for science, and their experience in research and its management. They have been the major contributors to this field. They, following in the steps of their predecessors, have laid a solid foundation for further development and will build the future of life science and biotechnology in China.


The author appreciates the sharing of published and unpublished data by Drs Zhu Chen, Jianren Gu, Fude Fang, Mingrong Wang, Yan Shen, Qi Jin, Dongfeng Gu, Lin He, Wei Huang, Min Lou, Guoping Zhao, Jiahui Xia and other investigators. I thank Drs Huanming Yang, Saijuan Chen and Jianren Gu for contributions to preparing this review, and Dr. Dahai Zhu for kind suggestions. I thank the Ministry of Science and Technology and the National Natural Science Foundation of China and other funding agencies for their support.

Copyright information

© Springer-Verlag 2004