Skip to main content

Database-as-a-Service for Long-Tail Science

  • Conference paper
Scientific and Statistical Database Management (SSDBM 2011)

Abstract

Database technology remains underused in science, especially in the long tail — the small labs and individual researchers that collectively produce the majority of scientific output. These researchers increasingly require iterative, ad hoc analysis over ad hoc databases but cannot individually invest in the computational and intellectual infrastructure required for state-of-the-art solutions.

We describe a new “delivery vector” for database technology called SQLShare that emphasizes ad hoc integration, query, sharing, and visualization over pre-defined schemas. To empower non-experts to write complex queries, we synthesize example queries from the data itself and explore limited English hints to augment the process. We integrate collaborative visualization via a web-based service called VizDeck that uses automated visualization techniques with a card game metaphor to allow creation of interactive visual dashboards in seconds with zero programming.

We present data on the initial uptake and usage of the system and report preliminary results testingout new features with the datasets collected during the initial pilot deployment. We conclude that the SQLShare system and associated services have the potential to increase uptake of relational database technology in the long tail of science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Greenshpan, O., Milo, T., Polyzotis, N.: Matchup: Autocompletion for mashups. In: ICDE, pp. 1479–1482 (2009)

    Google Scholar 

  2. Akbarnejad, J., Chatzopoulou, G., Eirinaki, M., Koshy, S., Mittal, S., On, D., Polyzotis, N., Varman, J.S.V.: Sql querie recommendations. PVLDB 3(2) (2010)

    Google Scholar 

  3. Amazon Relational Database Service (RDS), http://www.amazon.com/rds/

  4. Amazon SimpleDB, http://www.amazon.com/simpledb/

  5. Anderson, C.: The long tail. Wired 12(10) (2004)

    Google Scholar 

  6. Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD Conference, pp. 1–12 (2007)

    Google Scholar 

  7. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003)

    Article  Google Scholar 

  8. Bouch, A., Kuchinsky, A., Bhatti, N.: Quality is in the eye of the beholder: meeting users’ requirements for internet quality of service. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2000, pp. 297–304. ACM, New York (2000)

    Google Scholar 

  9. Brown, P.G.: Overview of scidb: large scale array storage, processing and analysis. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 963–968. ACM, New York (2010)

    Google Scholar 

  10. Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. PVLDB 2(1) (2009)

    Google Scholar 

  11. Dörk, M., Carpendale, S., Collins, C., Williamson, C.: Visgets: Coordinated visualizations for web-based information exploration and discovery. IEEE Transactions on Visualization and Computer Graphics 14, 1205–1212 (2008)

    Article  Google Scholar 

  12. Elmeleegy, H., Ivan, A., Akkiraju, R., Goodwin, R.: Mashup advisor: A recommendation tool for mashup development. In: ICWS 2008: Proceedings of the 2008 IEEE International Conference on Web Services, pp. 337–344. IEEE Computer Society, Washington, DC, USA (2008)

    Google Scholar 

  13. Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: A new abstraction for information management. SIGMOD Record 34(4) (December 2005)

    Google Scholar 

  14. Google fusion tables, http://www.google.com/fusiontables

  15. Gene ontology, http://www.geneontology.org/

  16. Gotz, D., Wen, Z.: Behavior-driven visualization recommendation. In: Proceedings of the 13th International Conference on Intelligent User Interfaces, IUI 2009, pp. 315–324. ACM, New York (2009)

    Google Scholar 

  17. Graves, M., Bergeman, E.R., Lawrence, C.B.: Graph database systems for genomics. IEEE Eng. Medicine Biol. Special Issue on Managing Data for the Human Genome Project 11(6) (1995)

    Google Scholar 

  18. Gray, J., Liu, D.T., Nieto-Santisteban, M.A., Szalay, A.S., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. In: CoRR abs/cs/0502008 (2005)

    Google Scholar 

  19. Heber, G., Gray, J.: Supporting finite element analysis with a relational database backend; part 1: There is life beyond files. Technical report, Microsoft MSR-TR-2005-49 (April 2005)

    Google Scholar 

  20. Howe, B.: Sqlshare: Database-as-a-service for long tail science, http://escience.washington.edu/sqlshare

  21. Khoussainova, N., Kwon, Y., Balazinska, M., Suciu, D.: Snipsuggest: A context-aware sql autocomplete system. In: Proc. of the 37th VLDB Conf. (2011)

    Google Scholar 

  22. Large Hadron Collider (LHC), http://lhc.web.cern.ch

  23. Lin, J., Wong, J., Nichols, J., Cypher, A., Lau, T.A.: End-user programming of mashups with vegemite. In: Proceedings of the 13th International Conference on Intelligent User Interfaces, IUI 2009, pp. 97–106. ACM, New York (2009)

    Google Scholar 

  24. Big science and long-tail science. Term attributed to Jim Downing, http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=938

  25. Large Synoptic Survey Telescope, http://www.lsst.org/

  26. Mackinlay, J.: Automating the design of graphical presentations of relational information. ACM Transactions on Graphics 5, 110–141 (1986)

    Article  Google Scholar 

  27. Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB (2001)

    Google Scholar 

  28. Microsoft SQL Azure, http://www.microsoft.com/windowsazure/sqlazure/

  29. Norman, D.: The design of everyday things. Doubleday, New York (1990)

    Google Scholar 

  30. Sloan Digital Sky Survey, http://cas.sdss.org

  31. Yang, D.X., Procopiuc, C.M.: Summarizing relational databases. In: Proc. VLDB Endowment, vol. 2(1), pp. 634–645 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Howe, B. et al. (2011). Database-as-a-Service for Long-Tail Science. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22351-8_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22350-1

  • Online ISBN: 978-3-642-22351-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics