East European Conference on Advances in Databases and Information Systems

ADBIS 2015: Advances in Databases and Information Systems pp 198-212 | Cite as

Web Content Management Systems Archivability

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9282)

Abstract

Web archiving is the process of collecting and preserving web content in an archive for current and future generations. One of the key issues in web archiving is that not all websites can be archived correctly due to various issues that arise from the use of different technologies, standards and implementation practices. Nevertheless, one of the common denominators of current websites is that they are implemented using a Web Content Management System (WCMS). We evaluate the Website Archivability (WA) of the most prevalent WCMSs. We investigate the extent to which each WCMS meets the conditions for a safe transfer of their content to a web archive for preservation purposes, and thus identify their strengths and weaknesses. More importantly, we deduce specific recommendations to improve the WA of each WCMS, aiming to advance the general practice of web data extraction and archiving.

References

  1. 1.
    Banos, V., Kim, Y., Ross, S., Manolopoulos, Y.: CLEAR: a credible method to evaluate website archivability. In: Proceedings 10th International Conference on Preservation of Digital Objects (iPRES) (2013)Google Scholar
  2. 2.
    Banos, V., Manolopoulos, Y.: A quantitative approach to evaluate website archivability using the clear+ method. Int. J. Digital Libr. (2015)Google Scholar
  3. 3.
    Blanvillain, O., Kasioumis, N., Banos, V.: Blogforever crawler: techniques and algorithms to harvest modern weblogs. In: Proceedings 4th International Conference on Web Intelligence, Mining & Semantics (WIMS) (2014)Google Scholar
  4. 4.
    Boiko, B.: Understanding content management. Bull. Am. Soc. Inf. Sci. Technol. 28(1), 8–13 (2001)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Coalition, D.P.: Institutional strategies - standards and best practice guidelines (2012). http://www.dpconline.org/advice/preservationhandbook/institutional-strategies/standards-and-best-practice-guidelines. Accessed 10 November 2014
  6. 6.
    Day, M.: Metadata, curation reference manual (2005). http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/metadata. Accessed 10 November 2014
  7. 7.
    Donnelly, M.: JSTOR/Harvard Object Validation Environment (JHOVE). Digital Curation Centre Case Studies and Interviews (2006)Google Scholar
  8. 8.
    Faheem, M., Senellart, P.: Intelligent and adaptive crawling of web applications for web archiving. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 306–322. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  9. 9.
    Fernández-Garcia, N., Sánchez-Fernandez, L., Villamor-Lugo, J.: Next generation web technologies in content management. In: Proceedings (companion) 13th International Conference on World Wide Web (WWW), pp. 260–261 (2004)Google Scholar
  10. 10.
    Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext transfer protocol-http/1.1 (1999). http://tools.ietf.org/html/rfc2616. Accessed 10 November 2014
  11. 11.
    Gomes, D., Costa, M., Cruz, D., Miranda, J., Fontes, S.: Creating a billion-scale searchable web archive. In: Proceedings (companion) 22nd International Conference on World Wide Web (WWW), pp. 1059–1066 (2013)Google Scholar
  12. 12.
    Kasioumis, N., Banos, V., Kalb, H.: Towards building a blog preservation platform. World Wide Web 17(4), 799–825 (2014)CrossRefGoogle Scholar
  13. 13.
    Kelly, B., Guy, M.: Approaches to archiving professional blogs hosted in the cloud. In: Proceedings 7th International Conference on Preservation of Digital Objects (iPRES) (2010)Google Scholar
  14. 14.
    Lawrence, S., Pennock, D.M., Flake, G.W., Krovetz, R., Coetzee, F.M., Glover, E., Nielsen, F.Å., Kruger, A., Giles, C.L.: Persistence of web references in scientific research. IEEE Comput. 34(2), 26–31 (2001)CrossRefGoogle Scholar
  15. 15.
    McKeever, S.: Understanding web content management systems: evolution, lifecycle and market. Ind. Manage. Data Syst. 103(9), 686–692 (2003)CrossRefGoogle Scholar
  16. 16.
    Niu, J.: An overview of web archiving. D-Lib Magazine, 18(3/4) (2012)Google Scholar
  17. 17.
    Pennock, M., Davis, R.: Archivepress: a really simple solution to archiving blog content. In: Proceedings 6th International Conference on Preservation of Digital Objects (iPRES) (2009)Google Scholar
  18. 18.
    Pinsent, E., Davis, R., Ashley, K., Kelly, B., Guy, M., Hatcher, J.: PoWR: the preservation of web resources handbook (2010)Google Scholar
  19. 19.
    Rumianek, M.: Archiving and recovering database-driven websites. D-Lib Magazine 19(1/2) (2013)Google Scholar
  20. 20.
    W3Techs. Usage of content management systems for websites (2014). http://w3techs.com/technologies/overview/content_management/all. Accessed 10 November 2014

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of InformaticsAristotle UniversityThessalonikiGreece

Personalised recommendations