Bioinformatics from a Big Data Perspective: Meeting the Challenge

Gomez-Vela, Francisco; López, Aurelio; Lagares, José A.; Baena, Domingo S.; Barranco, Carlos D.; García-Torres, Miguel; Divina, Federico

doi:10.1007/978-3-319-56154-7_32

Francisco Gomez-Vela¹⁵,
Aurelio López¹⁵,
José A. Lagares¹⁵,
Domingo S. Baena¹⁵,
Carlos D. Barranco¹⁵,
Miguel García-Torres¹⁵ &
…
Federico Divina¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

1996 Accesses

Abstract

Recently, the rising of the Big Data paradigm has had a great impact in several fields. Bioformatics is one such field. In fact, Bioinfomatics had to evolve in order to adapt to this phenomenon. The exponential increase of the biological information available, forced the researchers to find new solutions to handle these new challenges.

In this paper we present our point of view on the problems intrinsic to Big Data (volume, velocity, variety and veracity), how they affect the Bioinformatics field, and some solutions that can help Bioinformatics practitioners to deal with the difficulties presented by Big Data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media, IBM, New York (2011)
Google Scholar
Greene, C., Tan, J., Ung, M., Moore, J., Cheng, C.: Big data bioinformatics. J. Cell. Physiol. 229(12), 1896–1900 (2014)
Article Google Scholar
Marx, V.: Biology: the big challenges of big data. Nature 498(7453), 255–260 (2013)
Article Google Scholar
Bizer, C., Boncz, P., Brodie, M., Erling, O.: The meaningful use of big data: four perspectives-four challenges. ACM SIGMOD Rec. 40(4), 56–60 (2012)
Article Google Scholar
Labrinidis, A., Jagadish, H.: Challenges and opportunities with big data. Proc. VLDB Endowment 5(12), 2032–2033 (2012)
Article Google Scholar
Cook, C., Bergman, M., Finn, R., Cochrane, G., Birney, E., Apweiler, R.: The European Bioinformatics Institute in 2016: data growth and integration. Nucleic Acids Res. 44(Database Issue), 20–26 (2016)
Article Google Scholar
Kashyap, H., Ahmed, H., Hoque, N., Swarup, R., Dhruba Kumar, B.: Big data analytics in bioinformatics: a machine learning perspective. Cornell Univ. Lib. Comput. Eng. Finan. Sci. 13 (2015)
Google Scholar
Gomez-Vela, F., Barranco, C., Diaz-Diaz, N.: Incorporating biological knowledge for construction of fuzzy networks of gene associations. Appl. Soft Comput. 42, 144–155 (2016)
Article Google Scholar
Liu, Y.: Data Mining Methods for Single Nucleotide Polymorphisms Analysis in Computational Biology. Ph.D. thesis AAI3510948 (2011)
Google Scholar
Kolesnikov, N., Hastings, E., Keays, M., Melnichuk, O., Tang, Y., Williams, E., Dylag, M., Kurbatova, N., Brandizi, M., Burdett, T., Megy, K., Pilicheva, E., Rustici, G., Tikhonov, A., Parkinson, H., Petryszak, R., Sarkans, U., Brazma, A.: Arrayexpress update-simplifying data submissions. Nucleic Acids Res. 43(Database Issue), 1113–1116 (2015)
Article Google Scholar
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)
Article Google Scholar
Sherlock, G., Boussard, T., Kasarskis, A., Binkley, G., Matese, J., Dwight, S., Kaloper, M., Weng, S., Jin, H., Ball, C., Eisen, M., Spellman, P.: The Stanford Microarray database. Nucleic Acid Res. 29(1), 152–155 (2001)
Article Google Scholar
Tateno, Y., Imanishi, T., Miyazaki, S., Fukami-Kobayashi, K., Saitou, N., Sugawara, H., Gojobori, T.: DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res. 30(1), 27–30 (2002)
Article Google Scholar
Maidak, B., Olsen, G., Larsen, N., Overbeek, R., McCaughey, M., Woese, C.: The RBP (Ribosomal Database Project). Nucleic Acids Res. 25(1), 109–110 (1997)
Article Google Scholar
Warde-Farley, D., Donaldson, S.L., Comes, O., Zuberi, K., Badrawi, R., Chao, P., Franz, M., Grouios, C., Kazi, F., Lopes, C., Maitland, A., Mostafavi, S., Montojo, J., Shao, Q., Wright, G., Bader, G., Morris, Q.: The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38(1), 214–220 (2010)
Article Google Scholar
Stark, C., Breitkreutz, B., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34(Database Issue), 535–539 (2006)
Article Google Scholar
Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M., Roth, A., Minguez, P., Doerks, T., Stark, M., Muller, J., Bork, P.: The string database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 39(Database Issue), 561–568 (2011)
Article Google Scholar
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Article Google Scholar
Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F., McKay, S., Matthews, L., May, B., Milacic, M., Rothfels, K., Shamovsky, V., Webber, M., Weiser, J., Williams, M., Wu, G., Stein, L., Hermjakob, H., D’Eustachio, P.: The Reactome pathway knowledgebase. Nucleic Acids Res. 44(Database Issue), 481–487 (2016)
Article Google Scholar
Cerami, E.G., Gross, B.E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G.D., Sander, C.: Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 39(Database Issue), 685–690 (2011)
Article Google Scholar
Ashburner, M., Ball, C.A.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000)
Article Google Scholar
Carbon, S., Ireland, A., Mungall, C., Shu, S., Marshall, B., Lewis, S.: AmiGO: online access to ontology and annotation data. Bioinformatics 25(2), 288–289 (2009)
Article Google Scholar
Hadoop, A.: Hadoop (2009)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dudley, J.T., Butte, A.: Reproducible in silico research in the era of cloud computing. Nature Biotechnol. 28(11), 1181–1185 (2010)
Article Google Scholar
Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., Mesirov, J.: Genepattern 2.0. Nat. Genet. 38(5), 500–501 (2006)
Article Google Scholar
Stein, L.: The case for cloud computing in genome informatics. Genome Biol. 11(5) (2010)
Google Scholar
NVIDIA: NVIDIA CUDA Programming Guide 2.0 (2008)
Google Scholar
Sumiyoshi, K., Hirata, K., Hiroi, N., Funahashi, A.: Acceleration of discrete stochastic biochemical simulation using GPGPU. Front. Physiol. 6 (2015)
Google Scholar
Mane, S.U., Pangu, K.H.: Disease diagnosis using pattern matching algorithm from DNA sequencing: a sequential and GPGPU based approach. In: International Conference on Informatics and Analytics, pp. 1–5 (2016)
Google Scholar
Spark, A.: Apache spark-lightning-fast cluster computing (2014)
Google Scholar
Triguero, I., Galar, M., Merino, D., Maillo, J., Bustince, H., Herrera, F.: Evolutionary undersampling for extremely imbalanced big data classification under apache spark. In: IEEE Congress on Evolutionary Computation (CEC), pp. 640–647 (2016)
Google Scholar
Boubela, R., Kalcher, K., Huf, W., Nasel, C., Moser, E.: Big data approaches for the analysis of large-scale fMRI data using apache spark and GPU processing: a demonstration on resting-state fMRI data from the human connectome project. Front. Neurosci. 9 (2015)
Google Scholar
Banker, K.: MongoDB in action. Manning Publications Co., Greenwich (2011)
Google Scholar
Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinform. 11(12), S1 (2010)
Article Google Scholar
Dudley, J., Butte, A.: A quick guide for developing effective bioinformatics programming skills. PLoS Comput. Biol. 5(12), e1000589 (2009)
Article Google Scholar
Kepner, J., Anderson, C., Arcand, W., Bestor, D., Bergeron, B., Byun, C., Hubbell, M., Michaleas, P., Mullen, J., O’Gwynn, D., Prout, A., Reuther, A., Rosa, A., Yee, C.: D4m 2.0 schema: a general purpose high performance schema for the accumulo database. In: IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6 (2013)
Google Scholar
Garcia-Torres, M., Gomez-Vela, F., Melian-Batista, B., Moreno-Vega, J.: High-dimensional feature selection via feature grouping: a variable neighborhood search approach. Inf. Sci. 326, 102–118 (2016)
Article MathSciNet Google Scholar
Bagyamathi, M., Inbarani, H.H.: A novel hybridized rough set and improved harmony search based feature selection for protein sequence classification. In: Hassanien, A.E., Azar, A.T., Snasael, V., Kacprzyk, J., Abawajy, J.H. (eds.) Big Data in Complex Systems. SBD, vol. 9, pp. 173–204. Springer, Cham (2015). doi:10.1007/978-3-319-11056-1_6
Google Scholar
Zeng, A., Li, T., Liu, D., Zhang, J., Chen, H.: A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst. 258, 39–60 (2015)
Article MATH MathSciNet Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Article MathSciNet Google Scholar
Li, X., Fang, Z.: Parallel clustering algorithms. Parallel Comput. 11(3), 275–290 (1989)
Article MATH MathSciNet Google Scholar
Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on MapReduce. In: Proceedings of the 1st International Conference on Cloud Computing, pp. 674–679 (2009)
Google Scholar
Chen, N., Chen, A., Zhou, L.: An incremental grid density-based clustering algorithm. J. Soft. 13(1), 1–7 (2002)
Google Scholar
Kumar, A., Daume, H.: A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 393–400 (2011)
Google Scholar
Pontes, B., Giraldez, R., Aguilar-Ruiz, J.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)
Article Google Scholar
Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinformatics 1(1), 24–45 (2004)
Article Google Scholar
Liu, W., Chen, L., Qu, H., Qin, L.: A parallel biclustering algorithm for gene expressing data. In: 2008 Fourth International Conference on Natural Computation, vol. 1, pp. 25–29 (2008)
Google Scholar
Jin, S., Hua, L.: An improved biclustering algorithm for gene expression data. Open Cybern. Systemics J. 8, 1141–1144 (2014)
Article Google Scholar
Orzechowski, P., Boryczko, K.: Effective biclustering on GPU-capabilities and constraints. Prz Elektrotechniczn 1, 131–134 (2015)
Google Scholar
Mejia-Roa, E., Garcia, C., Gomez, J., Prieto, M., Tirado, F., Nogales, R., Pascual-Montano, A.: Biclustering and classification analysis in gene expression using nonnegative matrix factorization on multi-GPU systems. In: 11th International Conference on Intelligent Systems Design and Applications, pp. 882–887 (2011)
Google Scholar
Arnedo-Fdez, J., Zwir, I., Romero-Zaliz, R.: Biclustering of very large datasets with GPU tecnology using cuda. In: Proceedings of V Latin American Symposium on High Performance Computing (2012)
Google Scholar
Liu, B., Yu, C., Wang, D., Cheung, R., Yan, H.: Design exploration of geometric biclustering for microarray data analysis in data mining. IEEE Trans. Parallel Distrib. Syst. 25(10), 2540–2550 (2014)
Article Google Scholar
Papadimitriou, S., Sun, J.: DisCo: Distributed co-clustering with Map-Reduce: a case study towards petabyte-scale end-to-end mining. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 512–521 (2008)
Google Scholar
Ruiqi, L., Yifan, Z., Jihong, G., Shuigeng, Z.: CloudNMF: a MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. Genomics Proteomics Bioinform. 12(1), 48–51 (2014)
Article Google Scholar
Hecker, M., Lambeck, S., Toepfer, S., Van Someren, E., Guthke, R.: Gene regulatory network inference: data integration in dynamic modelsa review. Biosystems 96(1), 86–103 (2009)
Article Google Scholar
Spencer-Angus, T., Yaochu, J.: Reconstructing biological gene regulatory networks: where optimization meets big data. Evol. Intel. 7(1), 29–47 (2014)
Article Google Scholar
Roy, S., Bhattacharyya, D., Kalita, J.: Reconstruction of gene co-expression network from microarray data using local expression patterns. BMC Bioinform. 15, 1–14 (2014)
Article Google Scholar
Rau, A., Jaffrezic, F., Foulley, J., Doerge, R.W.: Reverse engineering gene regulatory networks using approximate Bayesian computation. Stat. Comput. 22(6), 1257–1271 (2012)
Article MATH MathSciNet Google Scholar
Xiao, M., Zhang, L., He, B., Xie, J., Zhang, W.: A parallel algorithm of constructing gene regulatory networks. In: Proceedings of the 3rd International Symposium on Optimization and Systems Biology, pp. 184–188 (2009)
Google Scholar

Download references

Acknowledgement

This work has been funded by the Spanish Ministry of Science and Innovation under grant TIN2015-64776-C3-2-R.

Author information

Authors and Affiliations

Intelligent Data Analysis (DATAi), Division of Computer Science, Pablo de Olavide University, 41013, Seville, Spain
Francisco Gomez-Vela, Aurelio López, José A. Lagares, Domingo S. Baena, Carlos D. Barranco, Miguel García-Torres & Federico Divina

Authors

Francisco Gomez-Vela
View author publications
You can also search for this author in PubMed Google Scholar
Aurelio López
View author publications
You can also search for this author in PubMed Google Scholar
José A. Lagares
View author publications
You can also search for this author in PubMed Google Scholar
Domingo S. Baena
View author publications
You can also search for this author in PubMed Google Scholar
Carlos D. Barranco
View author publications
You can also search for this author in PubMed Google Scholar
Miguel García-Torres
View author publications
You can also search for this author in PubMed Google Scholar
Federico Divina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco Gomez-Vela .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Ignacio Rojas
Universidad de Granada, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gomez-Vela, F. et al. (2017). Bioinformatics from a Big Data Perspective: Meeting the Challenge. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-56154-7_32
Published: 01 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics