Skip to main content
Log in

Automating data sharing through authoring tools

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

In the current scientific publishing landscape, there is a need for an authoring workflow that easily integrates data and code into manuscripts and that enables the data and code to be published in reusable form. Automated embedding of data and code into published output will enable superior communication and data archiving. In this work, we demonstrate a proof of concept for a workflow, org-mode, which successfully provides this authoring capability and workflow integration. We illustrate this concept in a series of examples for potential uses of this workflow. First, we use data on citation counts to compute the h-index of an author, and show two code examples for calculating the h-index. The source for each example is automatically embedded in the PDF during the export of the document. We demonstrate how data can be embedded in image files, which themselves are embedded in the document. Finally, metadata about the embedded files can be automatically included in the exported PDF, and accessed by computer programs. In our customized export, we embedded metadata about the attached files in the PDF in an Info field. A computer program could parse this output to get a list of embedded files and carry out analyses on them. Authoring tools such as Emacs + org-mode can greatly facilitate the integration of data and code into technical writing. These tools can also automate the embedding of data into document formats intended for consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Dominik, C.: The Org Mode 8 Reference Manual: Organize Your Life with GNU Emacs. Samurai Media Limited, Hong Kong (2014)

    Google Scholar 

  2. Elsevier Content Innovations: Content innovation. http://www.elsevier.com/books-and-journals/content-innovation. Accessed 12 June 2015

  3. Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. 102(46), 16,569–16,572 (2005). doi:10.1073/pnas.0507655102

    Article  MATH  Google Scholar 

  4. Jupyter: Project Jupyter. The Jupyter Project provides a web-browser based computational notebook with a range of computational backends including Python, Julia, R and others. http://jupyter.org/. Accessed 26 June 2015

  5. Kitchin, J.R.: Data sharing in surface science. Surface science (in Press) (2015a). doi:10.1016/j.susc.2015.05.007, http://www.sciencedirect.com/science/article/pii/S0039602815001326

  6. Kitchin, J.R.: Examples of effective data sharing in scientific publishing. ACS Cata. 5(6), 3894–3899 (2015b). doi:10.1021/acscatal.5b00538

    Article  Google Scholar 

  7. Nature: Manuscript formatting guide. http://www.nature.com/nature/authors/gta/index.html#a5.11. Accessed 12 June 2015

  8. Pakin, S.: http://www.ctan.org/tex-archive/macros/latex/contrib/attachfile, v1.5b. Accessed 26 June 2015

  9. PDF Labs: PDFtk the pdf toolkit. https://wwwlabs.com/tools/pdftk-the-pdf-toolkit/. Accessed 26 June 2015

  10. Pérez, F., Granger, B.E.: IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9(3), 21–29 (2007). doi:10.1109/MCSE.2007.53, http://ipython.org

  11. Schulte, E., Davison, D.: Active documents with org-mode. Comput. Sci. Eng. 13(3), 66–73 (2011). doi:10.1109/MCSE.2011.41

    Article  Google Scholar 

  12. Schulte, E., Davison, D., Dye, T., Dominik, C.: A multi-language computing environment for literate programming and reproducible research. J. Stat. Softw. 46(3), 1–24, (2012). http://www.jstatsoft.org/v46/i03

  13. Whitmire, A., Briney, K., Nurnberger, A., Henderson, M., Atwood, T., Janz, M., Kozlowski, W., Lake, S., Vandegrift, M., Zilinski, L.: A table summarizing the federal public access policies resulting from the us office of science and technology policy memorandum of February 2013. figshare (2015). 10.6084/m9.figshare.1372041

  14. Zilinski, L., Scherer, D., Bullock, D., Horton, D., Matthews, C.: Evolution of data creation, management, publication, and curation in the research process. Transp. Res. Rec. J. Transp. Res. Board 2414, 9–19 (2014). doi:10.3141/2414-02

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lisa D. Zilinski.

Appendix

Appendix

1.1 Embedding data in images

We use the steganopy (https://pypi.python.org/pypi/steganopy/0.0.1) Python package to illustrate the use of steganography to put data in an image. The point is not that steganography is an ideal way to do this, but that our general approach is flexible. The embedded data could be XMP, or other types of metadata.

figure l

1.2 The custom export code

Here we define a custom table exporter. We use the regular table export mechanism, but save the contents of the table as a csv file. We define exports for two backends: LaTeX and HTML. For LaTeX, we use the attachfile [8] package to embed the data file in the PDF. For HTML, we insert a link to the data file, and a data uri link to the HTML output. We store the filename of each generated table in a global variable named *embedded-files* so we can create a new Info metadata entry in the exported PDF.

figure m

Next, we define an exporter for source blocks. We will write these to a file too, and put links to them in the exported files. We store the filename of each generated source file in a global variable named *embedded-files* so we can create a new Info metadata entry in the exported PDF.

figure n

Here, we define a derived back end for HTML and LaTeX export. These are identical to the standard export back ends, except for the modified behavior of the table and src-block elements.

figure o

Finally, here we run the command to generate the exported HTML manuscript.

figure p

In addition, here we generate the LaTeX manuscript, and then convert it to PDF. After the PDF is created, we insert the new InfoField into the PDF.

figure q

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kitchin, J.R., Van Gulick, A.E. & Zilinski, L.D. Automating data sharing through authoring tools. Int J Digit Libr 18, 93–98 (2017). https://doi.org/10.1007/s00799-016-0173-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-016-0173-7

Keywords

Navigation