Skip to main content
Log in

Bioinformatics — Mining the genome for information

  • Research Article
  • Published:
Frontiers of Electrical and Electronic Engineering in China

Abstract

Since the launching of the human genome sequencing project in the 1990s, genomic research has already achieved definite results. At the beginning of the present century, the complete genomes of several model organisms have already been sequenced, including a number of prokaryote microorganisms and the eukaryotes yeast (Saccharomyces cerevisiae), nematode (C. elegans), fruit fly (Drosophila melanogaster) and thale cress (Arabidopsis thaliana) as well as the major part of the human genome. These achievements signified that a new era of data mining and analysis on the human genome had commenced. The language of human genetics would gradually be read and understood, and the genetic information underlying metabolism, development, differentiation and evolution would progressively become known to mankind. Large amounts of data are already accumulating, but at present many of the rules that should guide the understanding of this information are yet unknown. Bioinformatics research is thus not only becoming more important, but is also faced with severe challenges as well as great opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Benson D A, Boguski M S, Lipman D J, Ostell J, Ouellette B F. GenBank. Nucleic Acids Research, 1998, 26(1): 1–7

    Article  Google Scholar 

  2. Ewing B, Hillier L, Wendl M C, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research, 1998, 8(3): 175–185

    Google Scholar 

  3. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research, 1998, 8(3): 186–194

    Google Scholar 

  4. Kent W J, Haussler D. GigAssembler: An algorithm for the initial assembly of the human genome working draft. Technical Report, UCSC-CRL-00-17, 2000

  5. Cormen T H, Leiserson C E, Rivest R L. Introduction to Algorithms. MIT Press, 1990

  6. Uberbacher E C, Xu Y, Mural R J. Discovering and understanding genes in human DNA sequence using GRAIL. Methods in Enzymology, 1996, 266: 259–281

    Article  Google Scholar 

  7. Uberbacher E C, Mural R J. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proceedings of the National Academy of Sciences of the United States of America, 1991, 88(24): 11261–11265

    Article  Google Scholar 

  8. Synder E E, Stormo G D. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucleic Acids Research, 1993, 21(3): 607–613

    Article  Google Scholar 

  9. Guigó R, Knudsen S, Drake N, Smith T. Prediction of gene structure. Journal of Molecular Biology, 1992, 226(1): 141–157

    Article  Google Scholar 

  10. Pesole G, Attimonelli M, Saccone C. Linguistic analysis of nucleotide sequences: Algorithms for pattern recognition and analysis of codon strategy. Method in Enzymology, 1996, 266: 281–294

    Article  Google Scholar 

  11. Girbal L, Soucaille P. Regulation of solvent production in Clostridium acetobutylicum. Trends in Biotechnology, 1998, 16(1): 11–16

    Article  Google Scholar 

  12. Henderson J, Salzberg S, Fasman K H. Finding genes in DNA with a hidden Markov model. Journal of Computational Biology, 1997, 4(2): 127–141

    Article  Google Scholar 

  13. Xiao Y, Chen R S, Shen R Q, Sun J, Xu J. Fractal dimension of exon and intron sequences. Journal of Theoretical Biology, 1995, 175(1): 23–26

    Article  Google Scholar 

  14. Shen R Q, Chen R S, Ling L J, Sun J, Xiao Y, Xu J. The complexity of different regions of protein coding genes. Chinese Science Bulletin, 1993, 38(21): 1995–1997 (in Chinese)

    Google Scholar 

  15. Xu J, Chen R S, Ling L J, Shen R, Sun J. Coincident indices of exons and introns. Computers in Biology and Medicine, 1993, 23(4): 333–343

    Article  Google Scholar 

  16. Miller G, Fuchs R, Lai E. IMAGE cDNA clones, UniGene clustering, and ACeDB: An integrated resource for expressed sequence information. Genome Research, 1997, 7(10): 1027–1032

    Google Scholar 

  17. Eckman B A, Aaronson J S, Borkowski J A, Bailey W J, Elliston K O, Williamson A R, Blevins R A. The Merck Gene Index browser: An extensible data integration system for gene finding, gene characterization and EST data mining. Bioinformatics, 1998, 14(1): 2–13

    Article  Google Scholar 

  18. Houlgatte R, Mariage-Samson R, Duprat S, Tessier A, Bentolila S, Lamy B, Auffray C. The genexpress index: A resource for gene discovery and the genic map of the human genome. Genome Research, 1995, 5(3): 272–304

    Article  Google Scholar 

  19. Girard A, Sachidanandam R, Hannon G J, Carmell M A. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature, 2006, 442(7099): 199–202

    Google Scholar 

  20. Deng W, Zhu X P, Skogerbø G, Zhao Y, Fu Z, Wang Y D, He H S, Cai L, Sun H, Liu C N, Li B, Bai B Y, Wang J, Jia D, Sun S W, He H, Cui Y, Wang Y, Bu D B, Chen R S. Organization of the Caenorhabditis elegans small non-coding transcriptome: Genomic features, biogenesis and expression. Genome Research, 2006, 16(1): 20–29

    Article  Google Scholar 

  21. Chureau C, Prissette M, Bourdet A, Barbe B, Cattolico L, Jones L, Eggen A, Avner P, Duret L. Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine. Genome Research, 2002, 12(6): 894–908

    Google Scholar 

  22. Petrovics G, Zhang W, Makarem M, Street J P, Connelly R, Sun L, Sesterhenn I A, Srikantan V, Moul J W, Srivastava S. Elevated expression of PCGEM1, a prostate-specific gene with cell growth-promoting function, is associated with high-risk prostate cancer patients. Oncogene, 2004, 23(2): 605–611

    Article  Google Scholar 

  23. Xu F, McFarland M, Askew D S. His-1: A noncoding RNA implicated in mouse leukemogenesis. Current Science, 1999, 77(4): 545–549

    Google Scholar 

  24. Ji P, Diederichs S, Wang W, Böing S, Metzger R, Schneider P M, Tidow N, Brandt B, Buerger H, Bulk E, Thomas M, Berdel W E, Serve H, Müller-Tidow C. MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene, 2003, 22(39): 8031–8041

    Article  Google Scholar 

  25. Li W H, Graur D. Fundamentals of Molecular Evolution. Sinauer Associates, 1991

  26. Pearon W R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology, 1990, 183: 63–98

    Article  Google Scholar 

  27. Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17): 3389–3402

    Article  Google Scholar 

  28. Higgins D G, Bleasby A J, Fuchs R. CLUSTAL V: Improved software for multiple sequence alignment. Computer Applications in the Biosciences, 1992, 8(2): 189–191

    Google Scholar 

  29. Kumar S, Tamura K, Nei M. MEGA: Molecular evolutionary genetic analysis. University Park: Pennsylvania State University, 1993

    Google Scholar 

  30. Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 1985, 39(4): 783–791

    Article  Google Scholar 

  31. Kimura M. The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press, 1983

    Book  Google Scholar 

  32. Wang N, Chen R S. Comparison between phylogeny of introns and exons in primates. Chinese Science Bulletin, 1999, 44(21): 1940–1946

    Article  Google Scholar 

  33. Koonin E V, Tatusov R L, Galperin M Y. Beyond complete genomes: From sequence to structure and function. Current Opinion Structural Biology, 1998, 8(3): 355–363

    Article  Google Scholar 

  34. Somogyi R, Sniegoski C. Modeling the complexity of gene networks: Understanding multigenic and pleiotropic regulation. Complexity, 1996, 1(6): 45–63

    MathSciNet  Google Scholar 

  35. Vidal M. A biological atlas of functional maps. Cell, 2001, 104(1): 333–339

    Article  Google Scholar 

  36. Bu D B, Zhao Y, Cai L, Xue H, Zhu X P, Lu H C, Zhang J F, Sun S W, Ling L J, Zhang N, Li G J, Chen R S. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Research, 2003, 31(9): 2443–2450

    Article  Google Scholar 

  37. Gibson D, Kleinberg J, Raghavan P. Inferring Web communities from link topology. In: Preceedings of the 9th ACM Conference on Hypertext and Hypermedia. 1998, 225–234

  38. Kleinberg J M. Authoritative sources in a hyper-linked environment. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms. 1998, 668–677

  39. Zhang Z H, Liu C N, Skogerbø G, Zhu X P, Lu H C, Chen L, Shi B C, Zhang Y, Wang J, Wu T, Chen R S. Dynamic changes in subgraph preference profiles of crucial transcription factors. PLoS Computational Biology, 2006, 2(5): e47

    Article  Google Scholar 

  40. Lee T I, Rinaldi N J, Robert F, Odom D T, Bar-Joseph Z, Gerber G K, Hannett N M, Harbison C T, Thompson C M, Simon I, Zeitlinger J, Jennings E G, Murray H L, Gordon D B, Ren B, Wyrick J J, Tagne J B, Volkert T L, Fraenkel E, Gifford D K, Young R A. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 2002, 298(5594): 799–804

    Article  Google Scholar 

  41. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chclovskii D, Alon U. Network motifs: Simple building blocks of complex networks. Science, 2002, 298(5594): 824–827

    Article  Google Scholar 

  42. Vázquez A, Dobrin R, Sergi D, Eckmann J P, Oltvai Z N, Barabási A L. The topological relationship between the large-scale attributes and local interaction patterns of complex networks. Proceedings of the National Academy of Sciences of United States of America, 2004, 101(52): 17940–17945

    Article  Google Scholar 

  43. Guelzim N, Bottani S, Bourgnie P, Képès F. Topological and causal structure of the yeast transcriptional regulatory network. Nature Genetics, 2002, 31(1): 60–63

    Article  Google Scholar 

  44. Bray D. Protein molecules as computational elements in living cells. Nature, 1995, 376(6538): 307–312

    Article  Google Scholar 

  45. Luscombe N M, Babu M M, Yu H, Snyder M, Teichmann S A, Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004, 431(7006): 308–312

    Article  Google Scholar 

  46. Zhao Y, He S M, Liu C N, Ru S W, Zhao H T, Yang Z, Yang P C, Yuan X Y, Sun S W, Bu D B, Huang J F, Skogerbø G, Chen R S. MicroRNA regulation of messenger-like noncoding RNAs: A network of mutual microRNA control. Trends in Genetics, 2008, 24(7): 323–327

    Article  Google Scholar 

  47. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey D K, Ganesh M, Ghosh S, Bell I, Gerhard D S, Gingeras T R. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science, 2005, 308(5725): 1149–1154

    Article  Google Scholar 

  48. Bartel D P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell, 2004, 116(2): 281–297

    Article  Google Scholar 

  49. Will C L, Lührmann R. Spliceosomal UsnRNP biogenesis, structure and function. Current Opinion in Cell Biology, 2001, 13(3): 290–301

    Article  Google Scholar 

  50. Willingham A T, Orth A P, Batalov S, Peters E C, Wen B G, Aza-Blanc P, Hogenesch J B, Schultz P G. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science, 2005, 309(5740): 1570–1573

    Article  Google Scholar 

  51. Chen C L, Liang D, Zhou H, Zhuo M, Chen Y Q, Qu L H. The high diversity of snoRNAs in plants: Identification and comparative study of 120 snoRNA genes from Oryza sativa. Nucleic Acids Research, 2003, 31(10): 2601–2613

    Article  Google Scholar 

  52. Numata K, Kanai A, Saito R, Kondo S, Adachi J, Wilming L G, Hume D A, RIKEN GER Group, GSL Members, Hayashizaki Y, Tomita M. Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. Genome Research, 2003, 13(6B): 1301–1306

    Article  Google Scholar 

  53. Sood P, Krek A, Zavolan M, Macino G, Rajewsky N. Cell-type-specific signatures of microRNAs on target mRNA expression. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(8): 2746–2751

    Article  Google Scholar 

  54. Ho S P, DeGrado W F. Design of a 4-helix bundle protein: Synthesis of peptides which self-associate into a helical protein. Journal of the American Chemical Society, 1987, 109(22): 6751–6758

    Article  Google Scholar 

  55. Riechmann L, Clark M, Waldmann H, Winter G. Reshaping human antibodies for therapy. Nature, 1988, 332(6162): 323–327

    Article  Google Scholar 

  56. Liu X F, Xiao S, Gu Z, Wang Y, Chen A, Lin Q, Zhang W G, Huang H L, Sun J, Chen R S, Shen B F, Chen X. The expression of CD3 single chain and reshaping singledomain antibody. Science in China (Series C), 1996, 26(5): 428–435 (in Chinese)

    Google Scholar 

  57. Greer J, Erickson J W, Baldwin J J, Varney M D. Application of the three-dimensional structures of protein target molecules in structure-based drug design. Journal of Medicinal Chemistry, 1994, 37(8): 1035–1054

    Article  Google Scholar 

  58. Lam P Y, Jadhav P K, Eyermann C J, Hodge C N, Ru Y, Bacheler L T, Meek J L, Otto M J, Rayner M M, Wong Y N, Chang C H, Weber P C, Jackson D A, Sharpe T R, Erickson-Viitanen S. Rational design of potent, bioavailable, nonpeptide cyclic ureas as HIV protease inhibitors. Science, 1994, 263(5145): 380–384

    Article  Google Scholar 

  59. Blundell T L, Johnson M S. Catching a common fold. Protein Science, 1993, 2(6): 877–883

    Article  Google Scholar 

  60. Orengo C A, Jones D T, Thornton J M. Protein superfamilies and domain superfolds. Nature, 1994, 372(6507): 631–634

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Runsheng Chen.

Additional information

Professor Runsheng Chen is now Professor in Systems Biology Research Center and National Laboratory of Biomacromolecules at the Institute of Biophysics, Chinese Academy of Sciences. He is also a member of Human Genome Organization (HUGO), and a member of the biomacromolecule group of The Committee on Data for Science and Technology (CODATA). From 1992 to 1996 he was member of the Biophysics Professional Committee of the International Union of Pure and Applied Physics (IUPAP), and was ever the General-Secretary and Vice President of Chinese Society of Biophysics. He graduated in 1964 from the Department of Biophysics of the University of Science and Technology of China. From 1985 to 1987 he studied the electronic structure of biomacromolecules at the University of Erlangen-Nürnberg, as a fellow of the Alexander von Humboldt Foundation. After that he has been engaged in research cooperation with The Hong Kong University of Science and Technology, The Chinese University of Hong Kong, Osaka University, University of Erlangen-Nürnberg, University of California, Los Angeles, and Harvard University. In October 1996, Prof. Chen was invited to give a lecture called “From DNA sequence database to protein three-dimensional structure” at the 15th International CODATA Conference, and won the “Kotani Prize”. In 2007, he was elected as Member of the Chinese Academy of Sciences. Professor Chen was awarded “Ho Leung Ho Lee Prize” in 2008.

Prof. Chen has occupied with studies in bioinformatics over a number of years. He was the first in China to accomplish the assembly and gene annotation of a complete bacterial genome. He has further established statistical DNA sequence analysis, fractal dimension analysis, and work on neural networks, complexity, local area degeneracy factor analysis, cryptology and other methodologies. Among these, Prof. Chen set up cryptology studies in China for the first time. He also took part in the sequencing of 1% of the human genome and computer analysis of the rice genome draft. For 20 years, Prof. Chen has taken a systematic study in the field of bioinformatics, and published more than 120 SCI papers; besides, he was invited to give a report at international academic conference many times.

About this article

Cite this article

Chen, R., Skogerbø, G. Bioinformatics — Mining the genome for information. Front. Electr. Electron. Eng. China 5, 391–404 (2010). https://doi.org/10.1007/s11460-010-0109-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11460-010-0109-8

Keywords

Navigation