PrEV: Preservation Explorer and Vault for Web 2.0 User-Generated Content

  • Anqi Cui
  • Liner Yang
  • Dejun Hou
  • Min-Yen Kan
  • Yiqun Liu
  • Min Zhang
  • Shaoping Ma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7489)


We present the Preservation Explorer and Vault (PrEV) system, a city-centric multilingual digital library that archives and makes available Web 2.0 resources, and aims to store a comprehensive record of what urban lifestyle is like. To match the current state of the digital environment, a key architectural design choice in PrEV is to archive not only Web 1.0 web pages, but also Web 2.0 multilingual resources that include multimedia, real-time microblog content, as well as mobile application descriptions (e.g., iPhone app) in a collaborative manner. PrEV performs the preservation of such resources for posterity, and makes them available for programmatic retrieval by third party agents, and for exploration by scholars with its user interface.


Preservation Archive Visualization API Web 2.0 User-Generated Content NExT PrEV 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adar, E., Dontcheva, M., Fogarty, J., Weld, D.: Zoetrope: Interacting with the ephemeral web. In: Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology, pp. 239–248. ACM (2008)Google Scholar
  2. 2.
    Albertsen, K.: The paradigma web harvesting environment. In: Proceedings of the 3rd Workshop on Web Archives, pp. 49–62 (August 2003)Google Scholar
  3. 3.
    Ball, A.: Web archiving. Tech. rep., Digital Curation Centre, UKOLN, University of Bath (March 2010)Google Scholar
  4. 4.
    Campbell, L.E.: Recollection: Integrating Data through Access. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 396–397. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Chang, H.: Enriched Content: Concept, Architecture, Implementation, and Applications. Ph.D. thesis, New York University (2003)Google Scholar
  6. 6.
    Collins, C., Viegas, F., Wattenberg, M.: Parallel tag clouds to explore and analyze faceted text corpora. In: IEEE Symposium on Visual Analytics Science and Technology, VAST 2009, pp. 91–98. IEEE (2009)Google Scholar
  7. 7.
    Dougherty, M., Meyer, E., Madsen, C., Van den Heuvel, C., Thomas, A., Wyatt, S.: Researcher engagement with web archives: State of the art (2010)Google Scholar
  8. 8.
    Hallgrímsson, T.: The International Internet Preservation Consortium (IIPC). In: Conference of Directors of National Libraries (CDNL 2005), Oslo, Norway, pp. 14–18 (2005)Google Scholar
  9. 9.
    Hockx-Yu, H.: The past issue of the web. In: Proceedings of the ACM WebSci Conference 2011, pp. 1–8 (2011)Google Scholar
  10. 10.
    Hodge, G.: An information life-cycle approach: Best practices for digital archiving. Journal of Electronic Publishing 5(4) (2000)Google Scholar
  11. 11.
    JaJa, J., Song, S.: Robust tools and services for long-term preservation of digital information. Library Trends 57(3) (2009)Google Scholar
  12. 12.
    Jatowt, A., Kawai, Y., Tanaka, K.: Visualizing historical content of web pages. In: Proceedings of the 17th International Conference on World Wide Web, pp. 1221–1222. ACM (2008)Google Scholar
  13. 13.
    Jatowt, A., Kawai, Y., Tanaka, K.: Page history explorer: Visualizing and comparing page histories. IEICE Transactions on Information and Systems 94(3), 564 (2011)CrossRefGoogle Scholar
  14. 14.
    Kahle, B.: Preserving the Internet. Scientific American 276(3), 82–83 (1997)CrossRefGoogle Scholar
  15. 15.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)Google Scholar
  16. 16.
    McCown, F., Nelson, M.: What happens when facebook is gone? In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 251–254. ACM (2009)Google Scholar
  17. 17.
    Nelson, M., McCown, F., Smith, J., Klein, M.: Using the web infrastructure to preserve web pages. International Journal on Digital Libraries 6(4), 327–349 (2007)CrossRefGoogle Scholar
  18. 18.
    Petrovic, S., Osborne, M., Lavrenko, V.: The Edinburgh Twitter corpus. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media, pp. 25–26 (2010)Google Scholar
  19. 19.
    Ronald Jantz, M., Mlis, M.: Digital archiving and preservation: Technologies and processes for a trusted repository. Journal of Archival Organization 4(1-2), 193–213 (2007)CrossRefGoogle Scholar
  20. 20.
    Seadle, M.: Selection for digital preservation. Library Hi Tech. 22(2), 119–121 (2004)CrossRefGoogle Scholar
  21. 21.
    Van de Sompel, H., Nelson, M., Sanderson, R., Balakireva, L., Ainsworth, S., Shankar, H.: Memento: Time travel for the web. Arxiv preprint arxiv: 0911.1112 (2009)Google Scholar
  22. 22.
    Song, S.: Long-term information preservation and access. Ph.D. thesis, University of Maryland, College Park (2011)Google Scholar
  23. 23.
    Thomas, A., Meyer, E., Dougherty, M., Van den Heuvel, C., Madsen, C., Wyatt, S.: Researcher engagement with web archives: Challenges and opportunities for investment (2010)Google Scholar
  24. 24.
    Yan, H., Huang, L., Chen, C., Xie, Z.: A new data storage and service model of China web infomall. In: 8th European Conference on Research and Advanced Technologies for Digital Libraries The 4th International Web Archiving Workshop (IWAW 2004), Bath, UK (2004)Google Scholar
  25. 25.
    Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Anqi Cui
    • 1
  • Liner Yang
    • 1
  • Dejun Hou
    • 2
  • Min-Yen Kan
    • 2
  • Yiqun Liu
    • 1
  • Min Zhang
    • 1
  • Shaoping Ma
    • 1
  1. 1.State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.School of ComputingNational University of SingaporeSingapore

Personalised recommendations