Abstract
Since the launching of the human genome sequencing project in the 1990s, genomic research has already achieved definite results. At the beginning of the present century, the complete genomes of several model organisms have already been sequenced, including a number of prokaryote microorganisms and the eukaryotes yeast (Saccharomyces cerevisiae), nematode (C. elegans), fruit fly (Drosophila melanogaster) and thale cress (Arabidopsis thaliana) as well as the major part of the human genome. These achievements signified that a new era of data mining and analysis on the human genome had commenced. The language of human genetics would gradually be read and understood, and the genetic information underlying metabolism, development, differentiation and evolution would progressively become known to mankind. Large amounts of data are already accumulating, but at present many of the rules that should guide the understanding of this information are yet unknown. Bioinformatics research is thus not only becoming more important, but is also faced with severe challenges as well as great opportunities.
Similar content being viewed by others
References
Benson D A, Boguski M S, Lipman D J, Ostell J, Ouellette B F. GenBank. Nucleic Acids Research, 1998, 26(1): 1–7
Ewing B, Hillier L, Wendl M C, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research, 1998, 8(3): 175–185
Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research, 1998, 8(3): 186–194
Kent W J, Haussler D. GigAssembler: An algorithm for the initial assembly of the human genome working draft. Technical Report, UCSC-CRL-00-17, 2000
Cormen T H, Leiserson C E, Rivest R L. Introduction to Algorithms. MIT Press, 1990
Uberbacher E C, Xu Y, Mural R J. Discovering and understanding genes in human DNA sequence using GRAIL. Methods in Enzymology, 1996, 266: 259–281
Uberbacher E C, Mural R J. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proceedings of the National Academy of Sciences of the United States of America, 1991, 88(24): 11261–11265
Synder E E, Stormo G D. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucleic Acids Research, 1993, 21(3): 607–613
Guigó R, Knudsen S, Drake N, Smith T. Prediction of gene structure. Journal of Molecular Biology, 1992, 226(1): 141–157
Pesole G, Attimonelli M, Saccone C. Linguistic analysis of nucleotide sequences: Algorithms for pattern recognition and analysis of codon strategy. Method in Enzymology, 1996, 266: 281–294
Girbal L, Soucaille P. Regulation of solvent production in Clostridium acetobutylicum. Trends in Biotechnology, 1998, 16(1): 11–16
Henderson J, Salzberg S, Fasman K H. Finding genes in DNA with a hidden Markov model. Journal of Computational Biology, 1997, 4(2): 127–141
Xiao Y, Chen R S, Shen R Q, Sun J, Xu J. Fractal dimension of exon and intron sequences. Journal of Theoretical Biology, 1995, 175(1): 23–26
Shen R Q, Chen R S, Ling L J, Sun J, Xiao Y, Xu J. The complexity of different regions of protein coding genes. Chinese Science Bulletin, 1993, 38(21): 1995–1997 (in Chinese)
Xu J, Chen R S, Ling L J, Shen R, Sun J. Coincident indices of exons and introns. Computers in Biology and Medicine, 1993, 23(4): 333–343
Miller G, Fuchs R, Lai E. IMAGE cDNA clones, UniGene clustering, and ACeDB: An integrated resource for expressed sequence information. Genome Research, 1997, 7(10): 1027–1032
Eckman B A, Aaronson J S, Borkowski J A, Bailey W J, Elliston K O, Williamson A R, Blevins R A. The Merck Gene Index browser: An extensible data integration system for gene finding, gene characterization and EST data mining. Bioinformatics, 1998, 14(1): 2–13
Houlgatte R, Mariage-Samson R, Duprat S, Tessier A, Bentolila S, Lamy B, Auffray C. The genexpress index: A resource for gene discovery and the genic map of the human genome. Genome Research, 1995, 5(3): 272–304
Girard A, Sachidanandam R, Hannon G J, Carmell M A. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature, 2006, 442(7099): 199–202
Deng W, Zhu X P, Skogerbø G, Zhao Y, Fu Z, Wang Y D, He H S, Cai L, Sun H, Liu C N, Li B, Bai B Y, Wang J, Jia D, Sun S W, He H, Cui Y, Wang Y, Bu D B, Chen R S. Organization of the Caenorhabditis elegans small non-coding transcriptome: Genomic features, biogenesis and expression. Genome Research, 2006, 16(1): 20–29
Chureau C, Prissette M, Bourdet A, Barbe B, Cattolico L, Jones L, Eggen A, Avner P, Duret L. Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine. Genome Research, 2002, 12(6): 894–908
Petrovics G, Zhang W, Makarem M, Street J P, Connelly R, Sun L, Sesterhenn I A, Srikantan V, Moul J W, Srivastava S. Elevated expression of PCGEM1, a prostate-specific gene with cell growth-promoting function, is associated with high-risk prostate cancer patients. Oncogene, 2004, 23(2): 605–611
Xu F, McFarland M, Askew D S. His-1: A noncoding RNA implicated in mouse leukemogenesis. Current Science, 1999, 77(4): 545–549
Ji P, Diederichs S, Wang W, Böing S, Metzger R, Schneider P M, Tidow N, Brandt B, Buerger H, Bulk E, Thomas M, Berdel W E, Serve H, Müller-Tidow C. MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene, 2003, 22(39): 8031–8041
Li W H, Graur D. Fundamentals of Molecular Evolution. Sinauer Associates, 1991
Pearon W R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology, 1990, 183: 63–98
Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17): 3389–3402
Higgins D G, Bleasby A J, Fuchs R. CLUSTAL V: Improved software for multiple sequence alignment. Computer Applications in the Biosciences, 1992, 8(2): 189–191
Kumar S, Tamura K, Nei M. MEGA: Molecular evolutionary genetic analysis. University Park: Pennsylvania State University, 1993
Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 1985, 39(4): 783–791
Kimura M. The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press, 1983
Wang N, Chen R S. Comparison between phylogeny of introns and exons in primates. Chinese Science Bulletin, 1999, 44(21): 1940–1946
Koonin E V, Tatusov R L, Galperin M Y. Beyond complete genomes: From sequence to structure and function. Current Opinion Structural Biology, 1998, 8(3): 355–363
Somogyi R, Sniegoski C. Modeling the complexity of gene networks: Understanding multigenic and pleiotropic regulation. Complexity, 1996, 1(6): 45–63
Vidal M. A biological atlas of functional maps. Cell, 2001, 104(1): 333–339
Bu D B, Zhao Y, Cai L, Xue H, Zhu X P, Lu H C, Zhang J F, Sun S W, Ling L J, Zhang N, Li G J, Chen R S. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Research, 2003, 31(9): 2443–2450
Gibson D, Kleinberg J, Raghavan P. Inferring Web communities from link topology. In: Preceedings of the 9th ACM Conference on Hypertext and Hypermedia. 1998, 225–234
Kleinberg J M. Authoritative sources in a hyper-linked environment. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms. 1998, 668–677
Zhang Z H, Liu C N, Skogerbø G, Zhu X P, Lu H C, Chen L, Shi B C, Zhang Y, Wang J, Wu T, Chen R S. Dynamic changes in subgraph preference profiles of crucial transcription factors. PLoS Computational Biology, 2006, 2(5): e47
Lee T I, Rinaldi N J, Robert F, Odom D T, Bar-Joseph Z, Gerber G K, Hannett N M, Harbison C T, Thompson C M, Simon I, Zeitlinger J, Jennings E G, Murray H L, Gordon D B, Ren B, Wyrick J J, Tagne J B, Volkert T L, Fraenkel E, Gifford D K, Young R A. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 2002, 298(5594): 799–804
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chclovskii D, Alon U. Network motifs: Simple building blocks of complex networks. Science, 2002, 298(5594): 824–827
Vázquez A, Dobrin R, Sergi D, Eckmann J P, Oltvai Z N, Barabási A L. The topological relationship between the large-scale attributes and local interaction patterns of complex networks. Proceedings of the National Academy of Sciences of United States of America, 2004, 101(52): 17940–17945
Guelzim N, Bottani S, Bourgnie P, Képès F. Topological and causal structure of the yeast transcriptional regulatory network. Nature Genetics, 2002, 31(1): 60–63
Bray D. Protein molecules as computational elements in living cells. Nature, 1995, 376(6538): 307–312
Luscombe N M, Babu M M, Yu H, Snyder M, Teichmann S A, Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004, 431(7006): 308–312
Zhao Y, He S M, Liu C N, Ru S W, Zhao H T, Yang Z, Yang P C, Yuan X Y, Sun S W, Bu D B, Huang J F, Skogerbø G, Chen R S. MicroRNA regulation of messenger-like noncoding RNAs: A network of mutual microRNA control. Trends in Genetics, 2008, 24(7): 323–327
Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey D K, Ganesh M, Ghosh S, Bell I, Gerhard D S, Gingeras T R. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science, 2005, 308(5725): 1149–1154
Bartel D P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell, 2004, 116(2): 281–297
Will C L, Lührmann R. Spliceosomal UsnRNP biogenesis, structure and function. Current Opinion in Cell Biology, 2001, 13(3): 290–301
Willingham A T, Orth A P, Batalov S, Peters E C, Wen B G, Aza-Blanc P, Hogenesch J B, Schultz P G. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science, 2005, 309(5740): 1570–1573
Chen C L, Liang D, Zhou H, Zhuo M, Chen Y Q, Qu L H. The high diversity of snoRNAs in plants: Identification and comparative study of 120 snoRNA genes from Oryza sativa. Nucleic Acids Research, 2003, 31(10): 2601–2613
Numata K, Kanai A, Saito R, Kondo S, Adachi J, Wilming L G, Hume D A, RIKEN GER Group, GSL Members, Hayashizaki Y, Tomita M. Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. Genome Research, 2003, 13(6B): 1301–1306
Sood P, Krek A, Zavolan M, Macino G, Rajewsky N. Cell-type-specific signatures of microRNAs on target mRNA expression. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(8): 2746–2751
Ho S P, DeGrado W F. Design of a 4-helix bundle protein: Synthesis of peptides which self-associate into a helical protein. Journal of the American Chemical Society, 1987, 109(22): 6751–6758
Riechmann L, Clark M, Waldmann H, Winter G. Reshaping human antibodies for therapy. Nature, 1988, 332(6162): 323–327
Liu X F, Xiao S, Gu Z, Wang Y, Chen A, Lin Q, Zhang W G, Huang H L, Sun J, Chen R S, Shen B F, Chen X. The expression of CD3 single chain and reshaping singledomain antibody. Science in China (Series C), 1996, 26(5): 428–435 (in Chinese)
Greer J, Erickson J W, Baldwin J J, Varney M D. Application of the three-dimensional structures of protein target molecules in structure-based drug design. Journal of Medicinal Chemistry, 1994, 37(8): 1035–1054
Lam P Y, Jadhav P K, Eyermann C J, Hodge C N, Ru Y, Bacheler L T, Meek J L, Otto M J, Rayner M M, Wong Y N, Chang C H, Weber P C, Jackson D A, Sharpe T R, Erickson-Viitanen S. Rational design of potent, bioavailable, nonpeptide cyclic ureas as HIV protease inhibitors. Science, 1994, 263(5145): 380–384
Blundell T L, Johnson M S. Catching a common fold. Protein Science, 1993, 2(6): 877–883
Orengo C A, Jones D T, Thornton J M. Protein superfamilies and domain superfolds. Nature, 1994, 372(6507): 631–634
Author information
Authors and Affiliations
Corresponding author
Additional information
Professor Runsheng Chen is now Professor in Systems Biology Research Center and National Laboratory of Biomacromolecules at the Institute of Biophysics, Chinese Academy of Sciences. He is also a member of Human Genome Organization (HUGO), and a member of the biomacromolecule group of The Committee on Data for Science and Technology (CODATA). From 1992 to 1996 he was member of the Biophysics Professional Committee of the International Union of Pure and Applied Physics (IUPAP), and was ever the General-Secretary and Vice President of Chinese Society of Biophysics. He graduated in 1964 from the Department of Biophysics of the University of Science and Technology of China. From 1985 to 1987 he studied the electronic structure of biomacromolecules at the University of Erlangen-Nürnberg, as a fellow of the Alexander von Humboldt Foundation. After that he has been engaged in research cooperation with The Hong Kong University of Science and Technology, The Chinese University of Hong Kong, Osaka University, University of Erlangen-Nürnberg, University of California, Los Angeles, and Harvard University. In October 1996, Prof. Chen was invited to give a lecture called “From DNA sequence database to protein three-dimensional structure” at the 15th International CODATA Conference, and won the “Kotani Prize”. In 2007, he was elected as Member of the Chinese Academy of Sciences. Professor Chen was awarded “Ho Leung Ho Lee Prize” in 2008.
Prof. Chen has occupied with studies in bioinformatics over a number of years. He was the first in China to accomplish the assembly and gene annotation of a complete bacterial genome. He has further established statistical DNA sequence analysis, fractal dimension analysis, and work on neural networks, complexity, local area degeneracy factor analysis, cryptology and other methodologies. Among these, Prof. Chen set up cryptology studies in China for the first time. He also took part in the sequencing of 1% of the human genome and computer analysis of the rice genome draft. For 20 years, Prof. Chen has taken a systematic study in the field of bioinformatics, and published more than 120 SCI papers; besides, he was invited to give a report at international academic conference many times.
About this article
Cite this article
Chen, R., Skogerbø, G. Bioinformatics — Mining the genome for information. Front. Electr. Electron. Eng. China 5, 391–404 (2010). https://doi.org/10.1007/s11460-010-0109-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11460-010-0109-8