Automating data sharing through authoring tools

Kitchin, John R.; Van Gulick, Ana E.; Zilinski, Lisa D.

doi:10.1007/s00799-016-0173-7

Automating data sharing through authoring tools

Published: 11 June 2016

Volume 18, pages 93–98, (2017)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

John R. Kitchin¹,
Ana E. Van Gulick^2,3 &
Lisa D. Zilinski²

1909 Accesses
2 Citations
14 Altmetric
3 Mentions
Explore all metrics

Abstract

In the current scientific publishing landscape, there is a need for an authoring workflow that easily integrates data and code into manuscripts and that enables the data and code to be published in reusable form. Automated embedding of data and code into published output will enable superior communication and data archiving. In this work, we demonstrate a proof of concept for a workflow, org-mode, which successfully provides this authoring capability and workflow integration. We illustrate this concept in a series of examples for potential uses of this workflow. First, we use data on citation counts to compute the h-index of an author, and show two code examples for calculating the h-index. The source for each example is automatically embedded in the PDF during the export of the document. We demonstrate how data can be embedded in image files, which themselves are embedded in the document. Finally, metadata about the embedded files can be automatically included in the exported PDF, and accessed by computer programs. In our customized export, we embedded metadata about the attached files in the PDF in an Info field. A computer program could parse this output to get a list of embedded files and carry out analyses on them. Authoring tools such as Emacs + org-mode can greatly facilitate the integration of data and code into technical writing. These tools can also automate the embedding of data into document formats intended for consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software Citations, Information Systems, and Beyond

Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication

Article Open access 14 July 2020

SILA: a system for scientific image analysis

Article Open access 31 October 2022

References

Dominik, C.: The Org Mode 8 Reference Manual: Organize Your Life with GNU Emacs. Samurai Media Limited, Hong Kong (2014)
Google Scholar
Elsevier Content Innovations: Content innovation. http://www.elsevier.com/books-and-journals/content-innovation. Accessed 12 June 2015
Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. 102(46), 16,569–16,572 (2005). doi:10.1073/pnas.0507655102
Article MATH Google Scholar
Jupyter: Project Jupyter. The Jupyter Project provides a web-browser based computational notebook with a range of computational backends including Python, Julia, R and others. http://jupyter.org/. Accessed 26 June 2015
Kitchin, J.R.: Data sharing in surface science. Surface science (in Press) (2015a). doi:10.1016/j.susc.2015.05.007, http://www.sciencedirect.com/science/article/pii/S0039602815001326
Kitchin, J.R.: Examples of effective data sharing in scientific publishing. ACS Cata. 5(6), 3894–3899 (2015b). doi:10.1021/acscatal.5b00538
Article Google Scholar
Nature: Manuscript formatting guide. http://www.nature.com/nature/authors/gta/index.html#a5.11. Accessed 12 June 2015
Pakin, S.: http://www.ctan.org/tex-archive/macros/latex/contrib/attachfile, v1.5b. Accessed 26 June 2015
PDF Labs: PDFtk the pdf toolkit. https://wwwlabs.com/tools/pdftk-the-pdf-toolkit/. Accessed 26 June 2015
Pérez, F., Granger, B.E.: IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9(3), 21–29 (2007). doi:10.1109/MCSE.2007.53, http://ipython.org
Schulte, E., Davison, D.: Active documents with org-mode. Comput. Sci. Eng. 13(3), 66–73 (2011). doi:10.1109/MCSE.2011.41
Article Google Scholar
Schulte, E., Davison, D., Dye, T., Dominik, C.: A multi-language computing environment for literate programming and reproducible research. J. Stat. Softw. 46(3), 1–24, (2012). http://www.jstatsoft.org/v46/i03
Whitmire, A., Briney, K., Nurnberger, A., Henderson, M., Atwood, T., Janz, M., Kozlowski, W., Lake, S., Vandegrift, M., Zilinski, L.: A table summarizing the federal public access policies resulting from the us office of science and technology policy memorandum of February 2013. figshare (2015). 10.6084/m9.figshare.1372041
Zilinski, L., Scherer, D., Bullock, D., Horton, D., Matthews, C.: Evolution of data creation, management, publication, and curation in the research process. Transp. Res. Rec. J. Transp. Res. Board 2414, 9–19 (2014). doi:10.3141/2414-02
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
John R. Kitchin
University Libraries, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Ana E. Van Gulick & Lisa D. Zilinski
Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, USA
Ana E. Van Gulick

Authors

John R. Kitchin
View author publications
You can also search for this author in PubMed Google Scholar
Ana E. Van Gulick
View author publications
You can also search for this author in PubMed Google Scholar
Lisa D. Zilinski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lisa D. Zilinski.

Appendix

1.1 Embedding data in images

We use the steganopy (https://pypi.python.org/pypi/steganopy/0.0.1) Python package to illustrate the use of steganography to put data in an image. The point is not that steganography is an ideal way to do this, but that our general approach is flexible. The embedded data could be XMP, or other types of metadata.

1.2 The custom export code

Here we define a custom table exporter. We use the regular table export mechanism, but save the contents of the table as a csv file. We define exports for two backends: LaTeX and HTML. For LaTeX, we use the attachfile [8] package to embed the data file in the PDF. For HTML, we insert a link to the data file, and a data uri link to the HTML output. We store the filename of each generated table in a global variable named *embedded-files* so we can create a new Info metadata entry in the exported PDF.

Next, we define an exporter for source blocks. We will write these to a file too, and put links to them in the exported files. We store the filename of each generated source file in a global variable named *embedded-files* so we can create a new Info metadata entry in the exported PDF.

Here, we define a derived back end for HTML and LaTeX export. These are identical to the standard export back ends, except for the modified behavior of the table and src-block elements.

Finally, here we run the command to generate the exported HTML manuscript.

In addition, here we generate the LaTeX manuscript, and then convert it to PDF. After the PDF is created, we insert the new InfoField into the PDF.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kitchin, J.R., Van Gulick, A.E. & Zilinski, L.D. Automating data sharing through authoring tools. Int J Digit Libr 18, 93–98 (2017). https://doi.org/10.1007/s00799-016-0173-7

Download citation

Received: 29 June 2015
Revised: 25 April 2016
Accepted: 10 May 2016
Published: 11 June 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s00799-016-0173-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automating data sharing through authoring tools

Abstract

Access this article

Similar content being viewed by others

Software Citations, Information Systems, and Beyond

Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication

SILA: a system for scientific image analysis

References