Skip to main content

Open Data Science

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XVII (IDA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11191))

Included in the following conference series:

Abstract

The increasing openness of data, methods, and collaboration networks has created new opportunities for research, citizen science, and industry. Whereas openly licensed scientific, governmental, and institutional data sets can now be accessed through programmatic interfaces, compressed archives, and downloadable spreadsheets, realizing the full potential of open data streams depends critically on the availability of targeted data analytical methods, and on user communities that can derive value from these digital resources. Interoperable software libraries have become a central element in modern statistical data analysis, bridging the gap between theory and practice, while open developer communities have emerged as a powerful driver of research software development. Drawing insights from a decade of community engagement, I propose the concept of open data science, which refers to the new forms of research enabled by open data, open methods, and open collaboration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://ropensci.org.

  2. 2.

    http://ec.europa.eu/eurostat/data/database.

  3. 3.

    https://ropensci.org.

  4. 4.

    http://ropengov.github.io.

  5. 5.

    The gisfin and helsinki packages; see http://ropengov.github.io.

  6. 6.

    https://project-open-data.cio.gov.

References

  1. Blondel, E.: rsdmx: Tools for Reading SDMX Data and Metadata (2018). https://doi.org/10.5281/zenodo.1173229 (R package)

  2. Boettiger, C., Chamberlain, S., Hart, E., Ram, K.: Building software, building community: lessons from the rOpenSci project. J. Open Res. Softw. 3 (2015). https://doi.org/10.5334/jors.bu

  3. Carpenter, B., et al.: Stan: a probabilistic programming language. J. Stat. Softw. 76 (2017). https://doi.org/10.18637/jss.v076.i01

  4. Gandrud, C.: Reproducible research with R and R Studio. Chapman & Hall/CRC, Boca Raton (2013)

    Google Scholar 

  5. Huber, W., et al.: Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015). https://doi.org/10.1038/nmeth.3252

    Article  Google Scholar 

  6. Lahti, L., Huovari, J., Kainu, M., Biecek, P.: Retrieval and analysis of eurostat open data with the eurostat package. R J. 9, 385–392 (2017). https://journal.r-project.org/archive/2017/RJ-2017-019/index.html

  7. Lahti, L., Ilomäki, N., Tolonen, M.: A quantitative study of history in the english short-title catalogue (ESTC) 1470–1800. LIBER Q. 25, 87–116 (2015). https://doi.org/10.18352/lq.10112

    Article  Google Scholar 

  8. Lahti, L., da Silva, F., Laine, M.P., Lhteenoja, V., Tolonen, M.: Alchemy & algorithms: perspectives on the philosophy and history of open science. RIO J. 3, e13593 (2017). https://doi.org/10.3897/rio.3.e13593

    Article  Google Scholar 

  9. Laine, H., Lahti, L., Lehto, A., Ollila, S., Miettinen, M.: Beyond open access - the changing culture of producing and disseminating scientific knowledge. In: Proceedings of the 19th International Academic Mindtrek Conference in Tampere, Finland, September 22–24. AcademicMindTrek’15: Proceedings of the 19th International Academic Mindtrek Conference, ACM, ACM New York, NY, USA (2015). http://dl.acm.org/citation.cfm?id=2818187

  10. Leo, L., Juuso, P., J.L., Kainu, M.: rOpenGov: open source ecosystem for computational social sciences and digital humanities (2013). http://ropengov.github.io, ICML/MLOSS workshop (Int’l Conf. on Machine Learning - Open Source Software workshop)

  11. McMurdie, J., Holmes, S.: phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013). https://doi.org/10.1371/journal.pone.0061217

    Article  Google Scholar 

  12. McTaggart, R., Daroczi, G., Leung, C.: Quandl: API wrapper for quandl.com (2015). http://CRAN.R-project.org/package=Quandl, R package version 2.7.0

  13. Reinhart, A.: pdfetch: fetch economic and financial time series data from public sources (2015). http://CRAN.R-project.org/package=pdfetch, R package version 0.1.7

  14. Salvatier, J., Wiecki, T., Fonnesbeck, C.: Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016). https://doi.org/10.7717/peerj-cs.55

    Article  Google Scholar 

  15. Toivonen, H., Gross, O.: Data mining and machine learning in computational creativity. Wiley Int. Rev. Data Min. Knowl. Disc. 5, 265–275 (2015). https://doi.org/10.1002/widm.1170

    Google Scholar 

  16. Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. Newsl. 15, 49–60 (2014)

    Article  Google Scholar 

  17. Weinert, K.: datamart: unified access to your data sources (2014). http://CRAN.R-project.org/package=datamart, R package version 0.5.2

  18. Wickham, H.: Tidy data. J. Stat. Softw. 59 (2014). https://doi.org/10.18637/jss.v059.i10

  19. Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016). http://ggplot2.org

  20. Wickham, H.: tidyverse: easily install and load the ‘Tidyverse’ (2017). https://CRAN.R-project.org/package=tidyverse, R package

Download references

Acknowledgements

I am grateful to the rOpenGov contributors, in particular Joona Lehtomäki, Markus Kainu, and Juuso Parkkinen, and our close collaborator Mikko Tolonen. The work has been partially funded by Academy of Finland (decisions 295741, 307127).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leo Lahti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lahti, L. (2018). Open Data Science. In: Duivesteijn, W., Siebes, A., Ukkonen, A. (eds) Advances in Intelligent Data Analysis XVII. IDA 2018. Lecture Notes in Computer Science(), vol 11191. Springer, Cham. https://doi.org/10.1007/978-3-030-01768-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01768-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01767-5

  • Online ISBN: 978-3-030-01768-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics