Archiving Software Surrogates on the Web for Future Reference

  • Helge HolzmannEmail author
  • Wolfram Sperber
  • Mila Runnwerth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9819)


Software has long been established as an essential aspect of the scientific process in mathematics and other disciplines. However, reliably referencing software in scientific publications is still challenging for various reasons. A crucial factor is that software dynamics with temporal versions or states are difficult to capture over time. We propose to archive and reference surrogates instead, which can be found on the Web and reflect the actual software to a remarkable extent. Our study shows that about a half of the webpages of software are already archived with almost all of them including some kind of documentation.


Scientific software management Web archives Analysis 


  1. 1.
    Peng, R.D.: Reproducible research in computational science. Science (New York, NY) 334, 1226–1227 (2011)CrossRefGoogle Scholar
  2. 2.
    Wilson, G., Aruliah, D., Brown, C.T., Hong, N.P.C., Davis, M., Guy, R.T., Haddock, S.H., Huff, K.D., Mitchell, I.M., Plumbley, M.D., et al.: Best practices for scientific computing. PLoS Biol. 12, e1001745 (2014)CrossRefGoogle Scholar
  3. 3.
    Goble, C.: Better software, better research. Internet Comput. 18, 4–8 (2014)CrossRefGoogle Scholar
  4. 4.
    Rusbridge, C., Burnhill, P., Ross, S., Buneman, P., Giaretta, D., Lyon, L., Atkinson, M.: The digital curation centre: a vision for digital curation. In: Local to Global Data Interoperability - Challenges and Technologies (2005)Google Scholar
  5. 5.
    Vogt, T.: Software dokumentieren!. Mitt. Dtsch. Math.-Ver. 22, 16–17 (2014)Google Scholar
  6. 6.
    Stanford, N.J., Wolstencroft, K., Golebiewski, M., Kania, R., Juty, N., Tomlinson, C., Owen, S., Butcher, S., Hermjakob, H., Le Novère, N., et al.: The evolution of standards and data management practices in systems biology. Mol. Syst. Biol. 11, 851 (2015)CrossRefGoogle Scholar
  7. 7.
    Collberg, C., Proebsting, T.A.: Repeatability in computer systems research. Commun. ACM 59, 62–69 (2016)CrossRefGoogle Scholar
  8. 8.
    A. P. Association: Publication Manual of the American Psychological Association, 6th edn. American Psychological Association, Washington (2009)Google Scholar
  9. 9.
    Gibaldi, J., Einsohn, A., Díaz, A., Uría, R., Rodríguez Sáenz, D., Labadie, J., Fontane, D., Floris, V., Chou, N.: MLA Style Manual and Guide to Scholarly Publishing, 3rd edn. Modern Language Association of America, New York (2008)Google Scholar
  10. 10.
    Pampel, H., Vierkant, P., Scholze, F., Bertelmann, R., Kindling, M., Klump, J., Goebelbecker, H.-J., Gundlach, J., Schirmbacher, P., Dierolf, U.: Making research data repositories visible: the registry. PloS One 8, e78080 (2013)CrossRefGoogle Scholar
  11. 11.
    Macdonald, S.: Edinburgh DataShare - a DSpace data repository: achievements and aspirations. Presented at the Fedora-UK&I&EU Meeting, Oxford (2009)Google Scholar
  12. 12.
    Kraft, A., Razum, M., Potthoff, J., Porzel, A., Engel, T., Lange, F., van den Broek, K., Furtado, F.: The RADAR project - a service for research data archival and publication. ISPRS Int. J. Geo-Inf. 5, 28 (2016)CrossRefGoogle Scholar
  13. 13.
    Hockx-Yu, H.: Access and scholarly use of web archives. Alex. J. Natl. Int. Libr. Inf. Issues 25, 113–127 (2014)Google Scholar
  14. 14.
    Gomes, D., Costa, M.: The importance of web archives for humanities. Int. J. Humanit. Arts Comput. 8, 106–123 (2014)CrossRefGoogle Scholar
  15. 15.
    Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the web is archived? In: JCDL (2011)Google Scholar
  16. 16.
    Alkwai, L.M., Nelson, M.L., Weigle, M.C.: How well are Arabic websites archived? In: JCDL (2015)Google Scholar
  17. 17.
    Holzmann, H., Nejdl, W., Anand, A.: The dawn of today’s popular domains - a study of the archived german web over 18 years. In: JCDL (2016)Google Scholar
  18. 18.
    AlSum, A., Weigle, M.C., Nelson, M.L., Van de Sompel, H.: Profiling web archive coverage for top-level domain and content language. Int. J. Digit. Libr. 14(3–4), 149–166 (2014)CrossRefGoogle Scholar
  19. 19.
    Day, M., MacDonald, A., Pennock, M., Kimura, A.: Implementing digital preservation strategy: developing content collection profiles at the British library. In: JCDL (DL 2014) (2014)Google Scholar
  20. 20.
    Alam, S., Nelson, M.L., Van de Sompel, H., Balakireva, L.L., Shankar, H., Rosenthal, D.S.H.: Web archive profiling through CDX summarization. In: Kapidakis, S., et al. (eds.) TPDL 2015. LNCS, vol. 9316, pp. 3–14. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24592-8_1 CrossRefGoogle Scholar
  21. 21.
    Kasioumis, N., Banos, V., Kalb, H.: Towards building a blog preservation platform. World Wide Web J. 17, 799–825 (2014)CrossRefGoogle Scholar
  22. 22.
    Marshall, C.C., Shipman, F.M.: An argument for archiving Facebook as a heterogeneous personal store. In: JCDL (DL 2014) (2014)Google Scholar
  23. 23.
    SalahEldeen, H.M., Nelson, M.L.: Losing my revolution: how many resources shared on social media have been lost? In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds.) TPDL 2012. LNCS, vol. 7489, pp. 125–137. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  24. 24.
    Holzmann, H., Anand, A.: Tempas: temporal archive search based on tags. In: WWW (Demo) (2016)Google Scholar
  25. 25.
    Holzmann, H., Nejdl, W., Anand, A.: On the applicability of delicious for temporal search on web archives. In: SIGIR (Short) (2016)Google Scholar
  26. 26.
    Greuel, G.-M., Sperber, W.: swMATH – an information service for mathematical software. In: Hong, H., Yap, C. (eds.) ICMS 2014. LNCS, vol. 8592, pp. 691–701. Springer, Heidelberg (2014)Google Scholar
  27. 27.
    Holzmann, H., Goel, V., Anand, A.: Archivespark: efficient web archive access, extraction and derivation. In: JCDL (2016)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Helge Holzmann
    • 1
    Email author
  • Wolfram Sperber
    • 2
  • Mila Runnwerth
    • 3
  1. 1.L3S Research CenterHannoverGermany
  2. 2.zbMATH, FIZ Karlsruhe - Leibniz Institute for Information InfrastructureBerlinGermany
  3. 3.German National Library of Science and Technology (TIB)HannoverGermany

Personalised recommendations