Big Data Analytics and Its Prospects in Computational Proteomics
The volume and variety of data in biology is increasing at an exponential velocity. Every week new proteins are getting sequenced and novel structures are being discovered. With the advent of hitherto unknown diseases, it has become imperative that vaccines and drugs be designed as fast as possible. This is causing an immense surge of information which is becoming increasing difficult to process due to limited computational resources. Thus the need of the hour is to harness technologies, like Big Data, which will help distribute computations over a group of nodes and hasten the process of data analysis. In this paper we have explored some techniques to dispense the job of data analysis to several computers which could work in parallel and reach a solution faster.
KeywordsBig data Computational proteomics Hadoop MapReduce Parallel implementation
- 1.Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)Google Scholar
- 2.May, M.: Life science technologies: big biological impacts from big data. Science (80), 344, 1298–1300 (2014)Google Scholar
- 6.Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)Google Scholar
- 13.Sumazin, P., Yang, X., Chiu, H.H.-S., Chung, W.W.-J., Iyer, A., Llobet-Navas, D., Rajbhandari, P., Bansal, M., Guarnieri, P., Silva, J.: An extensive microRNA-mediated network of RNA–RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147, 370–381 (2011)CrossRefGoogle Scholar