Abstract
Biological data, represented by the data from omics platforms, are accumulating exponentially. As some other data-intensive scientific disciplines such as high-energy physics, climatology, meteorology, geology, geography and environmental sciences, modern life sciences have entered the information-rich era, the era of the 4th paradigm. The creation of Chinese information engineering infrastructure for pan-omics studies (CIEIPOS) has been long overdue as part of national scientific infrastructure, in accelerating the further development of Chinese life sciences, and translating rich data into knowledge and medical applications. By gathering facts of current status of international and Chinese bioinformatics communities in collecting, managing and utilizing biological data, the essay stresses the significance and urgency to create a ‘data hub’ in CIEIPOS, discusses challenges and possible solutions to integrate, query and visualize these data. Another important component of CIEIPOS, which is not part of traditional biological data centers such as NCBI and EBI, is omics informatics. Mass spectroscopy platform was taken as an example to illustrate the complexity of omics informatics. Its heavy dependency on computational power is highlighted. The demand for such power in omics studies is argued as the fundamental function to meet for CIEIPOS. Implementation outlook of CIEIPOS in hardware and network is discussed.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Schadt E E, Linderman M D, Sorenson J, et al. Computational solutions to large-scale data management and analysis. Nat Rev Genet, 2010, 11: 647–657
Smith A, Balazinska M, Baru C, et al. Biology and data-intensive scientific discovery in the beginning of the 21st century. OMICS, 2011, 15: 209–212
Kolker E, Stewart E, Ozdemir V. Opportunities and challenges for the life sciences community. OMICS, 2012, 16: 136–147
Crosswell L, Thornton J. ELIXIR: a distributed infrastructure for European biological data. Trends Biotechnol, 2012, 30: 241–242
Bu D C, Yu K T, Sun S L, et al. NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res, 2012, 40: D210–D215
Wei L P, Yu J. Bioinformatics in China: a personal perspective. PLoS Comput Biol, 2008, 4: e1000020
Zdobnov E M, Lopez R, Apweiler R, et al. The EBI SRS server—recent developments. Bioinformatics, 2002, 18: 368–373
Saltz J H, Oster S, Hastings S L, et al. Integrating heterogeneous rules-engine technologies with caGrid. AMIA Annu Symp Proc, 2007, 11: 1099
Smedley D, Haider S, Ballester B, et al. BioMart—biological queries made easy. BMC Genomics, 2009, 14: 22
Livne O E, Schultz N D, Narus S P. Federated querying architecture with clinical & translational health IT application. J Med Syst, 2011, 35: 1211–1224
van Vlymen J, de Lusignan S. A system of metadata to control the process of query, aggregating, cleaning and analysing large datasets of primary care data. Inform Prim Care, 2005, 13: 281–291
Shah P K, Perez-Iratxeta C, Bork P, et al. Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics, 2003, 4: 20
Gehlenborg N, O’Donoghue S I, Baliga N S, et al. Visualization of omics data for systems biology. Nat Meth, 2010, 7: S56–S68
Iragne F, Nikolski M, Mathieu B, et al. ProViz: protein interaction visualization and exploration. Bioinformatics, 2005, 21: 272–274
Zhou T T. Computational reconstruction of metabolic networks from KEGG. Methods Mol Biol, 2013, 930: 235–249
Funahashi A, Matsuoka Y, Jouraku A, et al. CellDesigner 3.5: a versatile modeling tool for biochemical networks. Proc IEEE, 2008, 96: 1254–1265
Leinonen R, Akhtar R, Birney E, et al. Improvements to services at the European Nucleotide Archive. Nucleic Acids Res, 2010, 38: D39–D45
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 2012, 489: 57–74
Kuehn B M. 1000 Genomes Project finds substantial genetic variation among populations. JAMA, 2012, 308: 2322–2325
Flicek P, Ahmed I, Amode M R, et al. Ensembl 2013. Nucleic Acids Res, 2013, 41: D48–55
Meyer L R, Zweig A S, Hinrichs A S, et al. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res, 2013, 41: D64–69
Vizcaíno J A, Côté R G, Csordas A, et al. The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res, 2012, doi: 10.1093/nar/gks1262
Ji L, Barrett T, Ayanbule O, et al. NCBI Peptidome: a new repository for mass spectrometry proteomics data. Nucleic Acids Res, 2010, 38: D731–D735
Vizcaíno J A, Foster J M, Martens L. Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research. J Proteomics, 2010, 73: 2136–2146
Dowell R D, Jokerst R M, Day A, et al. The distributed annotation system. BMC Bioinformatics, 2001, 2: 7
Boeckmann B, Bairoch A, Apweiler R, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res, 2003, 31: 365–370
Hassanien A E, Milanova M, Smolinski T, et al. Computational intelligence in solving bioinformatics problems: reviews, perspectives, and challenges. Comp Intel in Biomed & Bioinform, SCI, 2008, 151: 3–47
Taylor R C. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics, 2010, 11: S1
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters, In: Proceedings of the 6th Symposium on OSDI, San Francisco, USA 2004. 137–150
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is published with open access at Springerlink.com
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Zhu, W., Zhu, Y. & Yang, X. Information engineering infrastructure for life sciences and its implementation in China. Sci. China Life Sci. 56, 220–227 (2013). https://doi.org/10.1007/s11427-013-4440-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-013-4440-1