Automating data sharing through authoring tools

  • John R. Kitchin
  • Ana E. Van Gulick
  • Lisa D. Zilinski
Article

Abstract

In the current scientific publishing landscape, there is a need for an authoring workflow that easily integrates data and code into manuscripts and that enables the data and code to be published in reusable form. Automated embedding of data and code into published output will enable superior communication and data archiving. In this work, we demonstrate a proof of concept for a workflow, org-mode, which successfully provides this authoring capability and workflow integration. We illustrate this concept in a series of examples for potential uses of this workflow. First, we use data on citation counts to compute the h-index of an author, and show two code examples for calculating the h-index. The source for each example is automatically embedded in the PDF during the export of the document. We demonstrate how data can be embedded in image files, which themselves are embedded in the document. Finally, metadata about the embedded files can be automatically included in the exported PDF, and accessed by computer programs. In our customized export, we embedded metadata about the attached files in the PDF in an Info field. A computer program could parse this output to get a list of embedded files and carry out analyses on them. Authoring tools such as Emacs + org-mode can greatly facilitate the integration of data and code into technical writing. These tools can also automate the embedding of data into document formats intended for consumption.

Keywords

Data sharing Embedding Org-mode Authoring 

References

  1. 1.
    Dominik, C.: The Org Mode 8 Reference Manual: Organize Your Life with GNU Emacs. Samurai Media Limited, Hong Kong (2014)Google Scholar
  2. 2.
    Elsevier Content Innovations: Content innovation. http://www.elsevier.com/books-and-journals/content-innovation. Accessed 12 June 2015
  3. 3.
    Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. 102(46), 16,569–16,572 (2005). doi:10.1073/pnas.0507655102 CrossRefMATHGoogle Scholar
  4. 4.
    Jupyter: Project Jupyter. The Jupyter Project provides a web-browser based computational notebook with a range of computational backends including Python, Julia, R and others. http://jupyter.org/. Accessed 26 June 2015
  5. 5.
    Kitchin, J.R.: Data sharing in surface science. Surface science (in Press) (2015a). doi:10.1016/j.susc.2015.05.007, http://www.sciencedirect.com/science/article/pii/S0039602815001326
  6. 6.
    Kitchin, J.R.: Examples of effective data sharing in scientific publishing. ACS Cata. 5(6), 3894–3899 (2015b). doi:10.1021/acscatal.5b00538 CrossRefGoogle Scholar
  7. 7.
    Nature: Manuscript formatting guide. http://www.nature.com/nature/authors/gta/index.html#a5.11. Accessed 12 June 2015
  8. 8.
  9. 9.
    PDF Labs: PDFtk the pdf toolkit. https://wwwlabs.com/tools/pdftk-the-pdf-toolkit/. Accessed 26 June 2015
  10. 10.
    Pérez, F., Granger, B.E.: IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9(3), 21–29 (2007). doi:10.1109/MCSE.2007.53, http://ipython.org
  11. 11.
    Schulte, E., Davison, D.: Active documents with org-mode. Comput. Sci. Eng. 13(3), 66–73 (2011). doi:10.1109/MCSE.2011.41 CrossRefGoogle Scholar
  12. 12.
    Schulte, E., Davison, D., Dye, T., Dominik, C.: A multi-language computing environment for literate programming and reproducible research. J. Stat. Softw. 46(3), 1–24, (2012). http://www.jstatsoft.org/v46/i03
  13. 13.
    Whitmire, A., Briney, K., Nurnberger, A., Henderson, M., Atwood, T., Janz, M., Kozlowski, W., Lake, S., Vandegrift, M., Zilinski, L.: A table summarizing the federal public access policies resulting from the us office of science and technology policy memorandum of February 2013. figshare (2015). 10.6084/m9.figshare.1372041
  14. 14.
    Zilinski, L., Scherer, D., Bullock, D., Horton, D., Matthews, C.: Evolution of data creation, management, publication, and curation in the research process. Transp. Res. Rec. J. Transp. Res. Board 2414, 9–19 (2014). doi:10.3141/2414-02 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • John R. Kitchin
    • 1
  • Ana E. Van Gulick
    • 2
    • 3
  • Lisa D. Zilinski
    • 2
  1. 1.Department of Chemical EngineeringCarnegie Mellon UniversityPittsburghUSA
  2. 2.University LibrariesCarnegie Mellon UniversityPittsburghUSA
  3. 3.Center for the Neural Basis of CognitionCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations