Abstract
The Big Data, a massive amount of data, is the most popular buzzword and popular paradigm to change a game of any data-intensive field. The engagement of Big Data technology provides a new direction to an organization and the Big Data gives a vision to biomedical data engineering. Numerous data-intensive fields engage Big Data technology to achieve their vision. Interestingly, the Big Data plays a crucial role in Big Biomedical Data Engineering (BBDE). The massive amount of biomedical data becomes a dilemma in terms of analysis, diagnosis, and prediction. Besides, large-scale medical data cannot be stored and processed without employing Big Data technology. The deployment of Big Data technology can change the game of biomedical engineering. This chapter exploits the role of Big Data in biomedical data engineering and its storage dilemma.
Keywords
- Big Data
- Big Data analytics
- Big Data storage
- Biomedical data analytics
- Biomedical
- Biomedical data engineering
- Healthcare
- Cancer genome
- Neurology
This is a preview of subscription content, access via your institution.
Buying options


References
Abuin, J. M., Pichel, J. C., Pena, T. F., & Amigo, J. (2015). BigBWA: Approaching the burrows-wheeler aligner to big data technologies. Bioinformatics, 31(24), 4003–4005.
Adams, J. U. (2015). Genetics: Big hopes for big data. Nature, 527(7578), S108–S109.
Al Aziz, M. M., Hasan, M. Z., Mohammed, N., & Alhadidi, D. (2016). Secure and efficient multiparty computation on genomic data. In Proceedings of the 20th International Database Engineering & Applications Symposium (pp. 278–283). New York: ACM. https://doi.org/10.1145/2938503.2938507.
Andronico, G., Ardizzone, V., Barbera, R., Becker, B., Bruno, R., Calanducci, A., Carvalho, D., Ciuffo, L., Fargetta, M., Giorgio, E., La Rocca, G., Masoni, A., Paganoni, M., Ruggieri, F., & Scardaci, D. (2011). e-infrastructures for e-science: A global view. Journal of Grid Computing, 9(2), 155–184. https://doi.org/10.1007/s10723-011-9187-y.
Baker, S., Xiang, W., & Atkinson, I. (2017). Internet of things for smart healthcare: Technologies, challenges, and opportunities. IEEE Access, (99), 1–1. https://doi.org/10.1109/ACCESS.2017.2775180.
Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A., & Escobar, G. (2014). Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Affairs, 33, 1123–1131.
Bender, E. (2015). Big data in biomedicine: 4 big questions. Nature, 527(7576), S19.
Bonenfant, M., Desai, B. C., Desai, D., Fung, B. C. M., Özsu, M. T., & Ullman, J. D. (2016). Panel: The state of data: Invited paper from panelists. In Proceedings of the 20th International Database Engineering & Applications Symposium (pp. 2–11). New York: ACM. https://doi.org/10.1145/2938503.2939572.
Bourne, P. E., Lorsch, J. R., & Green, E. D. (2015). Perspective: Sustaining the big-data ecosystem. Nature, 527(7576), S16–S17. https://doi.org/10.1038/527S16a.
Branson, A., McClatchey, R., Goff, J. M. L., & Shamdasani, J. (2014). Cristal: A practical study in designing systems to cope with change. Information Systems, 42, 139–152. https://doi.org/10.1016/j.is.2013.12.009.
Bromley, D., Rysavy, S. J., Su, R., Toofanny, R. D., Schmidlin, T., & Daggett, V. (2014). Dive: A data intensive visualization engine. Bioinformatics, 30(4), 593–595.
Cassavia, N., Ciampi, M., De Pietro, G., & Masciari, E. (2016). A big data approach for querying data in EHR systems. In Proceedings of the 20th International Database Engineering & Applications Symposium (pp. 212–217). New York: ACM. https://doi.org/10.1145/2938503.2938539.
Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275, 314–347. https://doi.org/10.1016/j.ins.2014.01.015.
Chen, H. Y., Hsiung, M., Lee, H. C., Yen, E., Lin, S. C., & Wu, Y. T. (2010). GVSS: A high throughput drug discovery service of avian flu and dengue fever for EGEE and EUAsiaGrid. Journal of Grid Computing, 8(4), 529–541. https://doi.org/10.1007/s10723-010-9159-7.
Chen, H., Chen, W., Liu, C., Zhang, L., Su, J., & Zhou, X. (2016). Relational network for knowledge discovery through heterogeneous biomedical and clinical features. Scientific Reports, 6, 29915.
Clare, S. E., & Shaw, P. L. (2016). “Big data” for breast cancer: where to look and what you will find. NPJ Breast Cancer, 2, 16031.
Council, N. I. (2008). Disruptive technologies global trends 2025. Six technologies with potential impacts on us interests out to 2025. Accessed on 25 November 2017 from https://fas.org/irp/nic/disruptive.pdf
Cuzzocrea, A., Saccà, D., & Ullman, J. D. (2013). Big data: A research agenda. In Proceedings of the 17th International Database Engineering & Applications Symposium (pp. 198–203). New York: ACM. https://doi.org/10.1145/2513591.2527071.
Desai, B. C. (2014). The state of data. In Proceedings of the 18th International Database Engineering & Applications Symposium (pp. 77–86). New York: ACM. https://doi.org/10.1145/2628194.2628229.
Desai, B. C. (2014). Technological singularities. In Proceedings of the 19th International Database Engineering & Applications Symposium (pp. 10–22). New York: ACM. https://doi.org/10.1145/2790755.2790769.
Dunn, W., Burgun, A., Krebs, M. O., & Rance, B. (2016). Exploring and visualizing multidimensional data in translational research platforms. Brief Bioinformatics, bbw080.
Editorial. (2016). The power of big data must be harnessed for medical progress. Nature, 539(7630), 467–468. https://doi.org/10.1038/539467b.
Emeakaroha, V. C., Maurer, M., Stern, P., Łabaj, P. P., Brandic, I., & Kreil, D. P. (2013). Managing and optimizing bioinformatics workflows for data analysis in clouds. Journal of Grid Computing, 11(3), 407–428. https://doi.org/10.1007/s10723-013-9260-9.
Greene, A. C., Giffin, K. A., Greene, C. S., & Moore, J. H. (2016). Adapting bioinformatics curricula for big data. Brief Bioinformatics, 17(1), 43–50.
Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D. P., Kania, R., Schaeffer, M., Pierre, S. S., Twigger, S., White, O., & Rhee, S. Y. (2008). Big data: The future of biocuration. Nature, 455(7209), 47–50.
Hoxha, J., & Weng, C. (2016). Leveraging dialog systems research to assist biomedical researchers’ interrogation of big clinical data. Journal of Biomedical Informatics, 61, 176–184.
Huang, Z., Ayday, E., Lin, H., Aiyar, R. S., Molyneaux, A., Xu, Z., Fellay, J., Steinmetz, L. M., & Hubaux, J. P. (2016). A privacy-preserving solution for compressed storage and selective retrieval of genomic data. Genome Research, 26, 1687–1696.
Jiang, X., & Neapolitan, R. E. (2015). Evaluation of a two-stage framework for prediction using big genomic data. Brief Bioinformatics, 16(6), 912–921.
Jithesh, P. V., Donachy, P., Harmer, T., Kelly, N., Perrott, R., Wasnik, S., Johnston, J., McCurley, M., Townsley, M., & McKee, S. (2006). GeneGrid: Architecture, implementation and application. Journal of Grid Computing, 4(2), 209–222. https://doi.org/10.1007/s10723-006-9045-5.
Karasneh, Y., Ibrahim, H., Othman, M., & Yaakob, R. (2009). A model for matching and integrating heterogeneous relational biomedical databases schemas. In Proceedings of the 2009 International Database Engineering & Applications Symposium (pp. 242–250). New York: ACM. https://doi.org/10.1145/1620432.1620458.
Khazaei, H., McGregor, C., Eklund, M., El-Khatib, K., & Thommandram, A. (2014). Toward a big data healthcare analytics system: A mathematical modeling perspective. In 2014 IEEE World Congress on Services (pp. 208–215). https://doi.org/10.1109/SERVICES.2014.45.
Khoury, M. J., & Ioannidis, J. P. A. (2014). Big data meets public health. Science, 346(6213), 1054–1055.
Khozin, S., Kim, G., & Pazdur, R. (2017). Regulatory watch: From big data to smart data: FDA’s informed initiative. Nature Reviews Drug Discovery, 16(5), 306.
Landhuis, E. (2017). Neuroscience: Big brain, big data. Nature, 541(7638), 559–561.
Laney, D. (2015, February). Gartner predicts three big data trends for business intelligence. Gartner, 12. Retrieved on December 10, 2016, from http://www.forbes.com/sites/gartnergroup/2015/02/12/gartner-predicts-three-big-data-trends-for-business-intelligence/
Levine, A. G. (2014). An explosion of bioinformatics careers. Science. https://doi.org/10.1126/science.opms.r1400143.
Li, G., Bankhead, P., Dunne, P. D., O’Reilly, P. G., James, J. A., Salto-Tellez, M., Hamilton, P. W., & McArt, D. G. (2016). Embracing an integromic approach to tissue biomarker research in cancer: Perspectives and lessons learned. Brief Bioinformatics, 1–13. https://doi.org/10.1093/bib/bbw044.
Li, S., Besson, S., Blackburn, C., Carroll, M., Ferguson, R.K., Flynn, H., Gillen, K., Leigh, R., Lindner, D., Linkert, M., Moore, W. J., Ramalingam, B., Rozbicki, E., Rustici, G., Tarkowska, A., Walczysko, P., Williams, E., Allan, C., Burel, J. M., Moore, J., & Swedlow, J. R. (2016) Metadata management for high content screening in OMERO. Methods 96(Supplement C), 27–32 https://doi.org/10.1016/j.ymeth.2015.10.006, high-throughput Imaging.
Liu, J., Pacitti, E., Valduriez, P., & Mattoso, M. (2015). A survey of data-intensive scientific workflow management. Journal of Grid Computing, 13(4), 457–493. https://doi.org/10.1007/s10723-015-9329-8.
Lynch, C. (2008). Big data: How do your data grow? Nature, 455(7209), 28–29. https://doi.org/10.1038/455028a.
Maddineni, S., Kim, J., El-Khamra, Y., & Jha, S. (2012). Distributed application runtime environment (dare): A standards-based middleware framework for science-gateways. Journal of Grid Computing, 10(4), 647–664. https://doi.org/10.1007/s10723-012-9244-1.
Maestre, C., Segrelles Quilis, J. D., Torres, E., Blanquer, I., Medina, R., Hernández, V., & Martí, L. (2012). Assessing the usability of a science gateway for medical knowledge bases with TRENCADIS. Journal of Grid Computing, 10(4), 665–688. https://doi.org/10.1007/s10723-012-9243-2.
Marx, V. (2013). Biology: The big challenges of big data. Nature, 498(7453), 255–260. https://doi.org/10.1038/498255a.
Masseroli, M., Pinoli, P., Venco, F., Kaitoua, A., Jalili, V., Palluzzi, F., Muller, H., & Ceri, S. (2015). GenoMetric query language: a novel approach to large-scale genomic data management. Bioinformatics, 31(12), 1881–1888.
Mattmann, C. A. (2013). Computing: A vision for data science. Nature, 493(7433), 473–475. https://doi.org/10.1038/493473a.
McClatchey, R., Branson, A., & Shamdasani, J. (2016). Provenance support for biomedical big data analytics. In Proceedings of the 20th International Database Engineering & Applications Symposium (pp. 386–391). New York: ACM. https://doi.org/10.1145/2938503.2938540.
Mooney, S. J., Westreich, D. J., & El-Sayed, A. M. (2015). Epidemiology in the era of big data. Epidemiology (Cambridge, MA), 26(3), 390–394. https://doi.org/10.1097/EDE.0000000000000274.
Murdoch, T. B., & Detsky, A. S. (2013). The inevitable application of big data to health care. JAMA, 309(13), 1351–1352.
Nielsen, C. B., Younesy, H., O’Geen, H., Xu, X., Jackson, A. R., Milosavljevic, A., Wang, T., Costello, J. F., Hirst, M., Farnham, P. J., & Jones, S. J. M. (2012). Spark: A navigational paradigm for genomic data exploration. Genome Research, 22(11), 2262–2269.
Noor, A. M., Holmberg, L., Gillett, C., & Grigoriadis, A. (2015). Big data: The challenge for small research groups in the era of cancer genomics. British Journal of Cancer, 113(10), 1405–1412.
Patgiri, R. (2016). MDS: In-depth insight. In 2016 International Conference on Information Technology (ICIT) (pp. 193–199). https://doi.org/10.1109/ICIT.2016.048.
Patgiri, R., & Ahmed, A. (2016). Big data: The v’s of the game changer paradigm. In 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (pp. 17–24). Sydney: IEEE. https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.0014.
Patgiri, R., Dev, D., & Ahmed, A. (2018). dMDS: Uncover the hidden issues of metadata server design. In Progress in intelligent computing techniques: Theory, practice, and applications: Proceedings of ICACNI 2016 (Vol. 1, pp. 531–541). Singapore: Springer. https://doi.org/10.1007/978-981-10-3373-5_53.
Rider, A. K., & Chawla, N. V. (2013) An ensemble topic model for sharing healthcare data and predicting disease risk. In Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics (pp. 333:333–333:340). New York: ACM. https://doi.org/10.1145/2506583.2506640
Robbins, D. E., Gruneberg, A., Deus, H. F., Tanik, M. M., & Almeida, J. (2013). TCGA toolbox: an open web app framework for distributing big data analysis pipelines for cancer genomics. In Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics (pp. 62–67).
Robbins, D. E., Gruneberg, A., Deus, H. F., Tanik, M. M., & Almeida, J. S. (2013). A self-updating road map of the cancer genome atlas. Bioinformatics, 29(10), 1333–1340.
Rumsfeld, J. S., Joynt, K. E., & Maddox, T. M. (2016). Big data analytics to improve cardiovascular care: Promise and challenges. Nature Reviews Cardiology, 13(6). https://doi.org/10.1038/nrcardio.2016.42.
Saez-Rodriguez, J., Costello, J. C., Friend, S. H., Kellen, M. R., Mangravite, L., Meyer, P., Norman, T., & Stolovitzky, G. (2016). Crowdsourcing biomedical research: Leveraging communities as innovation engines. Nature Reviews Genetics, 17(8), 470–486.
Schadt, E. E. (2012). The changing privacy landscape in the era of big data. Molecular Systems Biology, 8(612), 1–3.
Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L., & Nolan, G. P. (2010). Computational solutions to large-scale data management and analysis. Nature Reviews Genetics, 11(9), 647–657.
Seife, C. (2015). Big data: The revolution is digitized. Nature, 518(7540), 480–481. https://doi.org/10.1038/518480a.
Shahand, S., Santcroos, M., van Kampen, A. H. C., & Olabarriaga, S. D. (2012). A grid-enabled gateway for biomedical data analysis. Journal of Grid Computing, 10(4), 725–742. https://doi.org/10.1007/s10723-012-9233-4.
Silva, G. G. Z., Green, K. T., Dutilh, B. E., & Edwards, R. A. (2016). Super-focus: A tool for agile functional analysis of shotgun metagenomic data. Bioinformatics, 32(3), 354–361.
Sinha, G. (2016). A career in cancer research? Computational skills wanted. Science. https://doi.org/10.1126/science.opms.r1600163.
Sinnott, R. O., Beuschlein, F., Effendy, J., Eisenhofer, G., Gloeckner, S., & Stell, A. (2016). Beyond a disease registry: An integrated virtual environment for adrenal cancer research. Journal of Grid Computing, 14(4), 515–532. https://doi.org/10.1007/s10723-016-9375-x.
Sonnhammer, E. L., Gabaldon, T., da Silva, A. W. S., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P. D., & Dessimoz, C. (2014). The quest for orthologs consortium: Big data and other challenges in the quest for orthologs. Bioinformatics, 30(21), 2993–2998.
Srinivasan, R., Li, Q., Zhou, X., Lu, J., Lichtman, J., & Wong, S. T. (2010). Reconstruction of the neuromuscular junction connectome. Bioinformatics, 26(12), i64–i70.
Stein, L. D., Knoppers, B. M., Campbell, P., Getz, G., & Korbel, J. O. (2015). Data analysis: Create a cloud commons. Nature, 523(7559), 149–151.
Szabo, C., Sheng, Q. Z., Kroeger, T., Zhang, Y., & Yu, J. (2014). Science in the cloud: Allocation and execution of data-intensive scientific workflows. Journal of Grid Computing, 12(2), 245–264. https://doi.org/10.1007/s10723-013-9282-3.
Ta, V. D., Liu, C. M., & Nkabinde, G. W. (2016). Big data stream computing in healthcare real-time analytics. In 2016 IEEE international conference on cloud computing and big data analysis (ICCCBDA) (pp. 37–42). https://doi.org/10.1109/ICCCBDA.2016.7529531.
Topol, E. J. (2015). The big medical data miss: Challenges in establishing an open medical resource. Nature Reviews Genetics, 16(5), 253–254.
Watts, N. A., & Feltus, F. A. (2017). Big data smart socket (BDSS): A system that abstracts data transfer habits from end users. Bioinformatics, 33(4), 627–628.
Weil, A. R. (2014). Big data in health: A new era for research and patient care. Health Affairs, 33, 1110.
Zeng, T., Zhang, W., Yu, X., Liu, X., Li, M., & Chen, L. (2016). Big-data-based edge biomarkers: Study on dynamical drug sensitivity and resistance in individuals. Brief Bioinformatics, 17(4), 576–592.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Patgiri, R., Nayak, S. (2020). Big Biomedical Data Engineering. In: Arabnia, H.R., Daimi, K., Stahlbock, R., Soviany, C., Heilig, L., Brüssau, K. (eds) Principles of Data Science. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-43981-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-43981-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43980-4
Online ISBN: 978-3-030-43981-1
eBook Packages: EngineeringEngineering (R0)