An Emerging Role for Polystores in Precision Medicine

  • Edmon Begoli
  • J. Blair Christian
  • Vijay Gadepally
  • Stavros Papadopoulos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10494)


Medical data is organically heterogeneous, and it usually varies significantly in both size and composition. Yet, this data is also a key for the recent and promising field of precision medicine, which focuses on identifying and tailoring appropriate medical treatments for the needs of the individual patients, based on their specific conditions, their medical history, lifestyle, genetic, and other individual factors. As we, and a database community at large, recognize that a “one size does not fit all” solution is required to work with such data, we present our observations based on our experiences, and the applications in the field of precision medicine. We make the case for the use of polystore architecture; how it applies for precision medicine; we discuss the reference architecture; describe some of its critical components (array database); and discuss the specific types of analysis that directly benefit from this database architecture, and the ways it serves the data.


Polystore Precision medicine Genomics Array database 



This manuscript has been in part authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy, and under a joint program (MVP CHAMPION), between the U.S. Department of Energy, and the U.S. Department of Veterans Affairs.

The authors would like to thank the Intel Science and Technology Center (ISTC) for Big Data and the BigDAWG contributors ( for their role in developing the BigDAWG system.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: SIGMOD (1998)Google Scholar
  6. 6.
    Benneyan, J.C., Lloyd, R.C., Plsek, P.E.: Statistical process control as a tool for research and healthcare improvement. Qual. Saf. Health Care 12(6), 458–464 (2003)CrossRefGoogle Scholar
  7. 7.
    Brown, P.G.: Overview of SciDB: large scale array storage, processing and analysis. In: SIGMOD (2010)Google Scholar
  8. 8.
    Carey, M.J., Haas, L.M., Schwarz, P.M., Arya, M., Cody, W.E., Fagin, R., Flickner, M., Luniewski, A.W., Niblack, W., Petkovic, D., et al.: Towards heterogeneous multimedia information systems: the Garlic approach. In: Proceedings of the Fifth International Workshop on Research Issues in Data Engineering, 1995: Distributed Object Management. RIDE-DOM 1995, pp. 124–131. IEEE (1995)Google Scholar
  9. 9.
    Chen, P., Gadepally, V., Stonebraker, M.: The bigdawg monitoring framework. In: High Performance Extreme Computing Conference (HPEC), 2016 IEEE, pp. 1–6. IEEE (2016)Google Scholar
  10. 10.
    Dasgupta, S., Coakley, K., Gupta, A.: Analytics-driven data ingestion and derivation in the AWESOME polystore. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2555–2564. IEEE (2016)Google Scholar
  11. 11.
    DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)CrossRefGoogle Scholar
  12. 12.
    Dziedzic, A., Elmore, A.J., Stonebraker, M.: Data transformation and migration in polystores. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)Google Scholar
  13. 13.
    Elmore, A., Duggan, J., Stonebraker, M., Balazinska, M., Cetintemel, U., Gadepally, V., Heer, J., Howe, B., Kepner, J., Kraska, T., et al.: A demonstration of the BigDAWG polystore system. Proc. VLDB Endow. 8(12), 1908–1911 (2015)CrossRefGoogle Scholar
  14. 14.
    Gadepally, V., Chen, P., Duggan, J., Elmore, A., Haynes, B., Kepner, J., Madden, S., Mattson, T., Stonebraker, M.: The BigDAWG polystore system and architecture. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)Google Scholar
  15. 15.
    Gadepally, V., OBrien, K., Dziedzic, A., Elmore, A., Kepner, J., Madden, S., Mattson, T., Rogers, J., She, Z., Stonebraker, M.: Version 0.1 of the BigDAWG Polystore System. arXiv preprint arXiv:1707.00721 (2017)
  16. 16.
    Gassner, P., Lohman, G.M., Schiefer, K.B., Wang, Y.: Query optimization in the IBM DB2 family. IEEE Data Eng. Bull. 16(4), 4–18 (1993)Google Scholar
  17. 17.
    Gupta, A.M., Gadepally, V., Stonebraker, M.: Cross-engine query execution in federated database systems. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)Google Scholar
  18. 18.
    Hudak, D.E., Ludban, N., Krishnamurthy, A., Gadepally, V., Samsi, S., Nehrbass, J.: A computational science IDE for HPC systems: design and applications. Int. J. Parallel Prog. 37(1), 91–105 (2009)CrossRefzbMATHGoogle Scholar
  19. 19.
    Kolev, B., Bondiombouy, C., Valduriez, P., Jiménez-Peris, R., Pau, R., Pereira, J.: The cloudmdsql multistore system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2113–2116. ACM (2016)Google Scholar
  20. 20.
    Krishnamurthy, A., Samsi, S., Gadepally, V.: Parallel MATALAB techniques. In: Image Processing. InTech (2009)Google Scholar
  21. 21.
    Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)CrossRefGoogle Scholar
  22. 22.
    Mattson, T., Gadepally, V., She, Z., Dziedzic, A., Parkhurst, J.: Demonstrating the BigDAWG polystore system for ocean metagenomics analysis. In: CIDR (2017)Google Scholar
  23. 23.
    Mirnezami, R., Nicholson, J., Darzi, A.: Preparing for precision medicine. N. Engl. J. Med. 366(6), 489–491 (2012)CrossRefGoogle Scholar
  24. 24.
    Ng, K., Ghoting, A., Steinhubl, S.R., Stewart, W.F., Malin, B., Sun, J.: PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. J. Biomed. Inform. 48, 160–170 (2014)CrossRefGoogle Scholar
  25. 25.
    Palmer, C.R.: Ethics, data-dependent designs, and the strategy of clinical trials: time to start learning-as-we-go? Stat. Methods Med. Res. 11(5), 381–402 (2002)CrossRefzbMATHGoogle Scholar
  26. 26.
    Papadopoulos, S., Datta, K., Madden, S., Mattson, T.: The tiledb array data storage manager. Proc. VLDB Endow. 10(4), 349–360 (2016)CrossRefGoogle Scholar
  27. 27.
    Roland, M., Torgerson, D.J.: Understanding controlled trials: what are pragmatic trials? BMJ: Br. Med. J. 316(7127), 285 (1998)CrossRefGoogle Scholar
  28. 28.
    Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G., Lehman, L.-W., Moody, G., Heldt, T., Kyaw, T.H., Moody, B., Mark, R.G.: Multiparameter intelligent monitoring in intensive care II (MIMIC-II): a public-access intensive care unit database. Crit. Care Med. 39(5), 952 (2011)CrossRefGoogle Scholar
  29. 29.
    Safran, C., Bloomrosen, M., Hammond, W.E., Labkoff, S., Markel-Fox, S., Tang, P.C., Detmer, D.E.: Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J. Am. Med. Inform. Assoc. 14(1), 1–9 (2007)CrossRefGoogle Scholar
  30. 30.
    She, Z., Ravishankar, S., Duggan, J.: Bigdawg polystore query optimization through semantic equivalences. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)Google Scholar
  31. 31.
    Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22(3), 183–236 (1990)CrossRefGoogle Scholar
  32. 32.
    Stonebraker, M., Cetintemel, U.: “one size fits all”: an idea whose time has come and gone. In: Proceedings of the 21st International Conference on Data Engineering. ICDE 2005, pp. 2–11. IEEE (2005)Google Scholar
  33. 33.
    Wang, J., Baker, T., Balazinska, M., Halperin, D., Haynes, B., Howe, B., Hutchison, D., Jain, S., Maas, R., Mehta, P., et al.: The myria big data management and analytics system and cloud services. In: CIDR (2017)Google Scholar
  34. 34.
    Yong, K.K., Karuppiah, E.K., See, S.C.-W.: Galactica: a GPU parallelized database accelerator. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, p. 10. ACM (2014)Google Scholar
  35. 35.
    Zhou, X., Liu, S., Kim, E.S., Herbst, R.S., Lee, J.J.: Bayesian adaptive design for targeted therapy development in lung cancera step toward personalized medicine. Clin. Trials 5(3), 181–193 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Edmon Begoli
    • 1
  • J. Blair Christian
    • 1
  • Vijay Gadepally
    • 2
  • Stavros Papadopoulos
    • 3
  1. 1.Oak Ridge National LaboratoryOak RidgeUSA
  2. 2.Massachusetts Institute of Technology (MIT), Lincoln LaboratoryLexingtonUSA
  3. 3.TileDB, Inc.CambridgeUSA

Personalised recommendations