Blog Preservation: Current Challenges and a New Paradigm

  • Vangelis Banos
  • Nikos Baltas
  • Yannis Manolopoulos
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 141)


Blogging is yet another popular and prominent application in the era of Web 2.0. According to recent measurements often considered as conservative, as of now worldwide there are more than 152 million blogs with content spanning over every aspect of life and science, necessitating long term blog preservation and knowledge management. In this work, we present a range of issues that arise when facing the task of blog preservation. We argue that current web archiving solutions are not able to capture the dynamic and continuously evolving nature of blogs, their network and social structure as well as the exchange of concepts and ideas that they foster. Furthermore, we provide directions and objectives that could be reached to realize robust digital preservation, management and dissemination facilities for blogs. Finally, we introduce the BlogForever EC funded project, its main motivation and findings towards widening the scope of blog preservation.


Blogs Blog preservation Web archiving 



The research leading to these results has received funding from the European Commission Framework Programme 7 (FP7), BlogForever project, grant agreement No.269963. We would also like to thank all BlogForever project partners for their invaluable contributions to the project.


  1. 1.
    Agarwal, N., Liu, H.: Blogosphere: research issues, tools and applications. ACM SIGKDD Explor. 10(1), 18–31 (2008)CrossRefGoogle Scholar
  2. 2.
    Arango-Docio, S., Sleeman, P., Kalb, H.: BlogForever: D2.1 survey implementation report. BlogForever WP2 Deliverable (2011)Google Scholar
  3. 3.
    Archive-it. Web Archiving Services. Accessed 11 April 2012
  4. 4.
    Arvidson, A.: Kulturarw3. In: Proceedings Conference on Strategies for the Internet: Preserving the Present for the Future, Copenhagen, pp. 101–104 (2001)Google Scholar
  5. 5.
    Ashley, K., Davis, R., Guy, M., Kelly, B., Pinsent, E., Farrell, S.: A guide to web preservation (2010)Google Scholar
  6. 6.
    Bhola, S., Strom, R., Bagchi, S., Zhao, Y.: Exactly-once delivery in a content-based publish-subscribe system. In: Proceedings International Conference on Dependable Systems and Networks (DNS), Washington, DC, pp. 7–16 (2002)Google Scholar
  7. 7.
    Billenness, C.: The future of the past – shaping new visions for EU-research in digital preservation. In: Proceedings Workshop, European Commission, Information Society and Media Directorate-General, Luxemburg (2011)Google Scholar
  8. 8.
    Campbell, L., Dulabahn, B.: Digital Preservation: the Twitter Archives and NDIIPP. In: Proceedings 7th International Conference Preservation of Digital Objects (iPRES), Vienna (2010)Google Scholar
  9. 9.
    CERN. Invenio. Accessed 09 April 2012
  10. 10.
    Commission, European: Information and Communications Technologies (2011)Google Scholar
  11. 11.
    Edelstein O., Factor, M., King, R., Risse, T., Salant, E., Taylor, P.: Evolving domains, problems and solutions for long term. In: Proceedings 8th International Conference Preservation of Digital Objects (iPRES), Singapore (2011)Google Scholar
  12. 12.
    Heritrix.: IA Web Crawler. (2012). Accessed 14 April 2012
  13. 13.
    Java A., Kolari P., Finin, T., Oates, T.: Modeling the spread of influence on the blogosphere. In: Proceedings 3rd WWW Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, Edinburgh (2006)Google Scholar
  14. 14.
    Kalb, H., Kasioumis, N., García Llopis, J., Postaci, S., Arango-Docio, S.: BlogForever: D4.1 User requirements and platform specifications report. Blogforever WP4 Deliverable (2011)Google Scholar
  15. 15.
    Kalb, H., Gkotsis, G, Pincent, E., Banos, V., Davis, R.: BlogForever D2.3 Weblog Ontologies Report. BlogForever WP2 Deliverable (2012)Google Scholar
  16. 16.
    Khare, R., Celik, T.: Microformats: a pragmatic path to the semantic web. In: Proceedings 15th International Conference on World Wide Web (WWW), Edinburgh, pp. 865–866 (2006)Google Scholar
  17. 17.
    Kim, Y., Ross, S.: BlogForever: D2.5 Weblog spam filtering report and associated methodology. BlogForever WP2 Report (2012)Google Scholar
  18. 18.
    LAWA. Longitudinal Analytics of Web Archive Data Project. (2012). Accessed 15 April 2012
  19. 19.
    Library of Congress. Web Archive. (2011). Accessed 10 April 2012
  20. 20.
    LiWA. Living Web Archives Project. Accessed 15 April 2012
  21. 21.
    McPhillips, S.: PANDORA Archive technical details. (2012). Accessed 05 Aug 2004
  22. 22.
    Occasio News Archive Database. (2012). Accessed 10 April 2012
  23. 23.
    PADICAT: The Digital Heritage of Catalonia. (2012). Accessed 10 April 2012
  24. 24. - Social Media and Website Archiving. (2012). Accessed 10 April 2012
  25. 25.
    Papazoglou, M.P., Ribbers, P.M.A.: E-business: Organizational and Technical Foundations. Wiley, West Sussex (2006)Google Scholar
  26. 26.
    Rynning, M., Banos, V., Stepanyan, K., Joy, M., Gulliksen, M.: BlogForever: D2.4 Weblog spider prototype and associated methodology. BlogForever WP2 Deliverable (2011)Google Scholar
  27. 27.
    Sroka, T.N.: Understanding the Political Influence of Blogs: A Study of the Growing Importance of the Blogosphere in the US Congress. Institute for Politics, Democracy and the Internet. (2006) Accessed 14 June 2009
  28. 28.
    Stepanyan, K., Gkotsis, G., Pincent, E., Banos, V., Davis, R.: BlogForever D2.6 Data extraction methodology report. BlogForever WP2 Deliverable (2012)Google Scholar
  29. 29.
    The Internet Archive. (1996)
  30. 30.
    VaultPress - Safeguard your site. (2012). Accessed 10 April 2012
  31. 31.
    Web Archiving, Library of Congress. (2012). Accessed 12 April 2012
  32. 32.
    Web Curator Tool Project. (2012). Accessed 12 April 2012
  33. 33.
    Winer, D.: Original announcement of blog ping. (2012). Accessed 12 April 2012

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Vangelis Banos
    • 1
  • Nikos Baltas
    • 2
  • Yannis Manolopoulos
    • 1
  1. 1.Department of InformaticsAristotle UniversityThessalonikiGreece
  2. 2.Department of ComputingImperial CollegeLondonUK

Personalised recommendations