Bioinformatics — Mining the genome for information

Chen, Runsheng; Skogerbø, Geir

doi:10.1007/s11460-010-0109-8

Bioinformatics — Mining the genome for information

Research Article
Published: 18 August 2010

Volume 5, pages 391–404, (2010)
Cite this article

Frontiers of Electrical and Electronic Engineering in China

Runsheng Chen¹ &
Geir Skogerbø¹

54 Accesses
Explore all metrics

Abstract

Since the launching of the human genome sequencing project in the 1990s, genomic research has already achieved definite results. At the beginning of the present century, the complete genomes of several model organisms have already been sequenced, including a number of prokaryote microorganisms and the eukaryotes yeast (Saccharomyces cerevisiae), nematode (C. elegans), fruit fly (Drosophila melanogaster) and thale cress (Arabidopsis thaliana) as well as the major part of the human genome. These achievements signified that a new era of data mining and analysis on the human genome had commenced. The language of human genetics would gradually be read and understood, and the genetic information underlying metabolism, development, differentiation and evolution would progressively become known to mankind. Large amounts of data are already accumulating, but at present many of the rules that should guide the understanding of this information are yet unknown. Bioinformatics research is thus not only becoming more important, but is also faced with severe challenges as well as great opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genomic Data Resources and Data Mining

Topics in Computational Genomics

Genome Mining Using Machine Learning Techniques

References

Benson D A, Boguski M S, Lipman D J, Ostell J, Ouellette B F. GenBank. Nucleic Acids Research, 1998, 26(1): 1–7
Article Google Scholar
Ewing B, Hillier L, Wendl M C, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research, 1998, 8(3): 175–185
Google Scholar
Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research, 1998, 8(3): 186–194
Google Scholar
Kent W J, Haussler D. GigAssembler: An algorithm for the initial assembly of the human genome working draft. Technical Report, UCSC-CRL-00-17, 2000
Cormen T H, Leiserson C E, Rivest R L. Introduction to Algorithms. MIT Press, 1990
Uberbacher E C, Xu Y, Mural R J. Discovering and understanding genes in human DNA sequence using GRAIL. Methods in Enzymology, 1996, 266: 259–281
Article Google Scholar
Uberbacher E C, Mural R J. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proceedings of the National Academy of Sciences of the United States of America, 1991, 88(24): 11261–11265
Article Google Scholar
Synder E E, Stormo G D. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucleic Acids Research, 1993, 21(3): 607–613
Article Google Scholar
Guigó R, Knudsen S, Drake N, Smith T. Prediction of gene structure. Journal of Molecular Biology, 1992, 226(1): 141–157
Article Google Scholar
Pesole G, Attimonelli M, Saccone C. Linguistic analysis of nucleotide sequences: Algorithms for pattern recognition and analysis of codon strategy. Method in Enzymology, 1996, 266: 281–294
Article Google Scholar
Girbal L, Soucaille P. Regulation of solvent production in Clostridium acetobutylicum. Trends in Biotechnology, 1998, 16(1): 11–16
Article Google Scholar
Henderson J, Salzberg S, Fasman K H. Finding genes in DNA with a hidden Markov model. Journal of Computational Biology, 1997, 4(2): 127–141
Article Google Scholar
Xiao Y, Chen R S, Shen R Q, Sun J, Xu J. Fractal dimension of exon and intron sequences. Journal of Theoretical Biology, 1995, 175(1): 23–26
Article Google Scholar
Shen R Q, Chen R S, Ling L J, Sun J, Xiao Y, Xu J. The complexity of different regions of protein coding genes. Chinese Science Bulletin, 1993, 38(21): 1995–1997 (in Chinese)
Google Scholar
Xu J, Chen R S, Ling L J, Shen R, Sun J. Coincident indices of exons and introns. Computers in Biology and Medicine, 1993, 23(4): 333–343
Article Google Scholar
Miller G, Fuchs R, Lai E. IMAGE cDNA clones, UniGene clustering, and ACeDB: An integrated resource for expressed sequence information. Genome Research, 1997, 7(10): 1027–1032
Google Scholar
Eckman B A, Aaronson J S, Borkowski J A, Bailey W J, Elliston K O, Williamson A R, Blevins R A. The Merck Gene Index browser: An extensible data integration system for gene finding, gene characterization and EST data mining. Bioinformatics, 1998, 14(1): 2–13
Article Google Scholar
Houlgatte R, Mariage-Samson R, Duprat S, Tessier A, Bentolila S, Lamy B, Auffray C. The genexpress index: A resource for gene discovery and the genic map of the human genome. Genome Research, 1995, 5(3): 272–304
Article Google Scholar
Girard A, Sachidanandam R, Hannon G J, Carmell M A. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature, 2006, 442(7099): 199–202
Google Scholar
Deng W, Zhu X P, Skogerbø G, Zhao Y, Fu Z, Wang Y D, He H S, Cai L, Sun H, Liu C N, Li B, Bai B Y, Wang J, Jia D, Sun S W, He H, Cui Y, Wang Y, Bu D B, Chen R S. Organization of the Caenorhabditis elegans small non-coding transcriptome: Genomic features, biogenesis and expression. Genome Research, 2006, 16(1): 20–29
Article Google Scholar
Chureau C, Prissette M, Bourdet A, Barbe B, Cattolico L, Jones L, Eggen A, Avner P, Duret L. Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine. Genome Research, 2002, 12(6): 894–908
Google Scholar
Petrovics G, Zhang W, Makarem M, Street J P, Connelly R, Sun L, Sesterhenn I A, Srikantan V, Moul J W, Srivastava S. Elevated expression of PCGEM1, a prostate-specific gene with cell growth-promoting function, is associated with high-risk prostate cancer patients. Oncogene, 2004, 23(2): 605–611
Article Google Scholar
Xu F, McFarland M, Askew D S. His-1: A noncoding RNA implicated in mouse leukemogenesis. Current Science, 1999, 77(4): 545–549
Google Scholar
Ji P, Diederichs S, Wang W, Böing S, Metzger R, Schneider P M, Tidow N, Brandt B, Buerger H, Bulk E, Thomas M, Berdel W E, Serve H, Müller-Tidow C. MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene, 2003, 22(39): 8031–8041
Article Google Scholar
Li W H, Graur D. Fundamentals of Molecular Evolution. Sinauer Associates, 1991
Pearon W R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology, 1990, 183: 63–98
Article Google Scholar
Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17): 3389–3402
Article Google Scholar
Higgins D G, Bleasby A J, Fuchs R. CLUSTAL V: Improved software for multiple sequence alignment. Computer Applications in the Biosciences, 1992, 8(2): 189–191
Google Scholar
Kumar S, Tamura K, Nei M. MEGA: Molecular evolutionary genetic analysis. University Park: Pennsylvania State University, 1993
Google Scholar
Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 1985, 39(4): 783–791
Article Google Scholar
Kimura M. The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press, 1983
Book Google Scholar
Wang N, Chen R S. Comparison between phylogeny of introns and exons in primates. Chinese Science Bulletin, 1999, 44(21): 1940–1946
Article Google Scholar
Koonin E V, Tatusov R L, Galperin M Y. Beyond complete genomes: From sequence to structure and function. Current Opinion Structural Biology, 1998, 8(3): 355–363
Article Google Scholar
Somogyi R, Sniegoski C. Modeling the complexity of gene networks: Understanding multigenic and pleiotropic regulation. Complexity, 1996, 1(6): 45–63
MathSciNet Google Scholar
Vidal M. A biological atlas of functional maps. Cell, 2001, 104(1): 333–339
Article Google Scholar
Bu D B, Zhao Y, Cai L, Xue H, Zhu X P, Lu H C, Zhang J F, Sun S W, Ling L J, Zhang N, Li G J, Chen R S. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Research, 2003, 31(9): 2443–2450
Article Google Scholar
Gibson D, Kleinberg J, Raghavan P. Inferring Web communities from link topology. In: Preceedings of the 9th ACM Conference on Hypertext and Hypermedia. 1998, 225–234
Kleinberg J M. Authoritative sources in a hyper-linked environment. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms. 1998, 668–677
Zhang Z H, Liu C N, Skogerbø G, Zhu X P, Lu H C, Chen L, Shi B C, Zhang Y, Wang J, Wu T, Chen R S. Dynamic changes in subgraph preference profiles of crucial transcription factors. PLoS Computational Biology, 2006, 2(5): e47
Article Google Scholar
Lee T I, Rinaldi N J, Robert F, Odom D T, Bar-Joseph Z, Gerber G K, Hannett N M, Harbison C T, Thompson C M, Simon I, Zeitlinger J, Jennings E G, Murray H L, Gordon D B, Ren B, Wyrick J J, Tagne J B, Volkert T L, Fraenkel E, Gifford D K, Young R A. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 2002, 298(5594): 799–804
Article Google Scholar
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chclovskii D, Alon U. Network motifs: Simple building blocks of complex networks. Science, 2002, 298(5594): 824–827
Article Google Scholar
Vázquez A, Dobrin R, Sergi D, Eckmann J P, Oltvai Z N, Barabási A L. The topological relationship between the large-scale attributes and local interaction patterns of complex networks. Proceedings of the National Academy of Sciences of United States of America, 2004, 101(52): 17940–17945
Article Google Scholar
Guelzim N, Bottani S, Bourgnie P, Képès F. Topological and causal structure of the yeast transcriptional regulatory network. Nature Genetics, 2002, 31(1): 60–63
Article Google Scholar
Bray D. Protein molecules as computational elements in living cells. Nature, 1995, 376(6538): 307–312
Article Google Scholar
Luscombe N M, Babu M M, Yu H, Snyder M, Teichmann S A, Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004, 431(7006): 308–312
Article Google Scholar
Zhao Y, He S M, Liu C N, Ru S W, Zhao H T, Yang Z, Yang P C, Yuan X Y, Sun S W, Bu D B, Huang J F, Skogerbø G, Chen R S. MicroRNA regulation of messenger-like noncoding RNAs: A network of mutual microRNA control. Trends in Genetics, 2008, 24(7): 323–327
Article Google Scholar
Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey D K, Ganesh M, Ghosh S, Bell I, Gerhard D S, Gingeras T R. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science, 2005, 308(5725): 1149–1154
Article Google Scholar
Bartel D P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell, 2004, 116(2): 281–297
Article Google Scholar
Will C L, Lührmann R. Spliceosomal UsnRNP biogenesis, structure and function. Current Opinion in Cell Biology, 2001, 13(3): 290–301
Article Google Scholar
Willingham A T, Orth A P, Batalov S, Peters E C, Wen B G, Aza-Blanc P, Hogenesch J B, Schultz P G. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science, 2005, 309(5740): 1570–1573
Article Google Scholar
Chen C L, Liang D, Zhou H, Zhuo M, Chen Y Q, Qu L H. The high diversity of snoRNAs in plants: Identification and comparative study of 120 snoRNA genes from Oryza sativa. Nucleic Acids Research, 2003, 31(10): 2601–2613
Article Google Scholar
Numata K, Kanai A, Saito R, Kondo S, Adachi J, Wilming L G, Hume D A, RIKEN GER Group, GSL Members, Hayashizaki Y, Tomita M. Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. Genome Research, 2003, 13(6B): 1301–1306
Article Google Scholar
Sood P, Krek A, Zavolan M, Macino G, Rajewsky N. Cell-type-specific signatures of microRNAs on target mRNA expression. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(8): 2746–2751
Article Google Scholar
Ho S P, DeGrado W F. Design of a 4-helix bundle protein: Synthesis of peptides which self-associate into a helical protein. Journal of the American Chemical Society, 1987, 109(22): 6751–6758
Article Google Scholar
Riechmann L, Clark M, Waldmann H, Winter G. Reshaping human antibodies for therapy. Nature, 1988, 332(6162): 323–327
Article Google Scholar
Liu X F, Xiao S, Gu Z, Wang Y, Chen A, Lin Q, Zhang W G, Huang H L, Sun J, Chen R S, Shen B F, Chen X. The expression of CD3 single chain and reshaping singledomain antibody. Science in China (Series C), 1996, 26(5): 428–435 (in Chinese)
Google Scholar
Greer J, Erickson J W, Baldwin J J, Varney M D. Application of the three-dimensional structures of protein target molecules in structure-based drug design. Journal of Medicinal Chemistry, 1994, 37(8): 1035–1054
Article Google Scholar
Lam P Y, Jadhav P K, Eyermann C J, Hodge C N, Ru Y, Bacheler L T, Meek J L, Otto M J, Rayner M M, Wong Y N, Chang C H, Weber P C, Jackson D A, Sharpe T R, Erickson-Viitanen S. Rational design of potent, bioavailable, nonpeptide cyclic ureas as HIV protease inhibitors. Science, 1994, 263(5145): 380–384
Article Google Scholar
Blundell T L, Johnson M S. Catching a common fold. Protein Science, 1993, 2(6): 877–883
Article Google Scholar
Orengo C A, Jones D T, Thornton J M. Protein superfamilies and domain superfolds. Nature, 1994, 372(6507): 631–634
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
Runsheng Chen & Geir Skogerbø

Authors

Runsheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Geir Skogerbø
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Runsheng Chen.

Additional information

Professor Runsheng Chen is now Professor in Systems Biology Research Center and National Laboratory of Biomacromolecules at the Institute of Biophysics, Chinese Academy of Sciences. He is also a member of Human Genome Organization (HUGO), and a member of the biomacromolecule group of The Committee on Data for Science and Technology (CODATA). From 1992 to 1996 he was member of the Biophysics Professional Committee of the International Union of Pure and Applied Physics (IUPAP), and was ever the General-Secretary and Vice President of Chinese Society of Biophysics. He graduated in 1964 from the Department of Biophysics of the University of Science and Technology of China. From 1985 to 1987 he studied the electronic structure of biomacromolecules at the University of Erlangen-Nürnberg, as a fellow of the Alexander von Humboldt Foundation. After that he has been engaged in research cooperation with The Hong Kong University of Science and Technology, The Chinese University of Hong Kong, Osaka University, University of Erlangen-Nürnberg, University of California, Los Angeles, and Harvard University. In October 1996, Prof. Chen was invited to give a lecture called “From DNA sequence database to protein three-dimensional structure” at the 15th International CODATA Conference, and won the “Kotani Prize”. In 2007, he was elected as Member of the Chinese Academy of Sciences. Professor Chen was awarded “Ho Leung Ho Lee Prize” in 2008.

Prof. Chen has occupied with studies in bioinformatics over a number of years. He was the first in China to accomplish the assembly and gene annotation of a complete bacterial genome. He has further established statistical DNA sequence analysis, fractal dimension analysis, and work on neural networks, complexity, local area degeneracy factor analysis, cryptology and other methodologies. Among these, Prof. Chen set up cryptology studies in China for the first time. He also took part in the sequencing of 1% of the human genome and computer analysis of the rice genome draft. For 20 years, Prof. Chen has taken a systematic study in the field of bioinformatics, and published more than 120 SCI papers; besides, he was invited to give a report at international academic conference many times.

About this article

Cite this article

Chen, R., Skogerbø, G. Bioinformatics — Mining the genome for information. Front. Electr. Electron. Eng. China 5, 391–404 (2010). https://doi.org/10.1007/s11460-010-0109-8

Download citation

Received: 31 March 2010
Accepted: 25 April 2010
Published: 18 August 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s11460-010-0109-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bioinformatics — Mining the genome for information

Abstract

Access this article

Similar content being viewed by others

Genomic Data Resources and Data Mining

Topics in Computational Genomics

Genome Mining Using Machine Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Navigation

Bioinformatics — Mining the genome for information

Abstract

Access this article

Similar content being viewed by others

Genomic Data Resources and Data Mining

Topics in Computational Genomics

Genome Mining Using Machine Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation