Skip to main content

Archiving the Hidden Web

  • Chapter
Web Archiving

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adams, K. C. (2001). The Web as Database: New Extraction Technologies and Content Management. Online, March

    Google Scholar 

  • Agichtein, E., Ipeirotis, P. G., & Gravano, L. (2003). Modeling Query-Based Access to Text Databases

    Google Scholar 

  • Barbosa, L. & Freire, J. (2004). Siphoning Hidden-Web Data through KeywordBased Interfaces. Paper presented at the SBBD

    Google Scholar 

  • Bergman, M. I. K. (2001). The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing, 7(1)

    Google Scholar 

  • Boston, T. (2005). Exposing the deep web to increase access to library collections. Paper presented at the AusWeb05. The Twelfth Australasian World Wide Web Conference, Queensland, Australia

    Google Scholar 

  • Boufkhad, Y. & Viennot, L. (2003). The Observable Web. RR

    Google Scholar 

  • Boyko, A. (2004). Test Bed Taxonomy. IIPC Reports, 16

    Google Scholar 

  • Brandman, O., Cho, J., Garcia-Molina, H., & Shivakumar, N. (2000). CrawlerFriendly Web Servers. SIGMETRICS Performance Evaluation Review, 28(2), 9-14

    Article  Google Scholar 

  • Callan, J. & Connell, M. (2001). Query-based sampling of text databases. ACM Transactions on Information Systems 19(2), 97-130

    Article  Google Scholar 

  • Castillo, C. (2004). Effective Web Crawling. University of Chile

    Google Scholar 

  • Chang, K. C.-C., He, B., Li, C., Patel, M., & Zhang, Z. (2004). Structured databases on the web: observations and implications. SIGMOD Records, 33(3), 61-70

    Article  Google Scholar 

  • Cope, J., Craswell, N., & Hawking, D. (2003). Automated discovery of search interfaces on the web. Paper presented at the Proceedings of the Fourteenth Australasian Database Conference on Database Technologies 2003

    Google Scholar 

  • Florescu, D., Levy, A., & Mendelzon, A. (1998). Database techniques for the World-Wide Web: A survey. SIGMOD Records, 27, 59-74

    Article  Google Scholar 

  • Frankewitsch, T. & Prokosch, U. (2001). Navigation in medical Internet image databases. Medical Informatics and the Internet in Medicine, 26(1), 1-15

    Google Scholar 

  • Gravano, L., Ipeirotis, P. G., & Sahami, M. (2003). QProber: A System for Automatic Classification of Hidden-Web Databases. ACM Transactions on Information Systems, 21(1)

    Google Scholar 

  • He, H., Meng, W., Yu, C., & Wu, Z. (2005). WISE-Integrator: a system for extracting and integrating complex web search interfaces of the deep web. Trondheim, Norway

    Google Scholar 

  • Hearst, M. (1998). Information Integration. IEEE Intelligent Systems, 13(5), 12-24

    Article  Google Scholar 

  • HTTrack. http://www.httrack.com/

  • Lage, J. P., Silva, A. S. D., Golgher, P. B., & Laender, A. H. F. (2002). Collecting hidden Web pages for data extraction. Paper presented at the Proceedings of the fourth international workshop on Web information and data management

    Google Scholar 

  • Lagoze, C. & Van de Sompel, H. (2001). The open archives initiative: building a low-barrier interoperability framework. Roanoke, Virginia, United States

    Google Scholar 

  • Lawrence, S. & Giles, C. L. (1999). Accessibility of Information on the Web. Nature, 400, 107-109

    Article  Google Scholar 

  • Liddle, W. S., Yau, S. H., & Embley, D. W. (2002). On the Automatic Extraction of Data from the Hidden Web. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Liu, X., Maly, K., Zubair, M., & Nelson, M. (2002). DP9 - an OAI gateway service for Web crawlers. Paper presented at the Second ACM/IEEE Joint Conference on Digital Libraries

    Google Scholar 

  • Ludäscher, B. & Gupta, A. (1999). Modeling Interactive Web Sources for Information Mediation. Paper presented at the Intl. Workshop on the World-Wide Web and Conceptual Modeling (WWWCM’99), Paris

    Google Scholar 

  • Marill, J., Boyko, A., & Ashenfelder, M. (2004). Web Harvesting Survey, 10

    Google Scholar 

  • Masanès, J. (2002). Archiving the deep web. Paper presented at the 2nd International Workshop on Web Archives (IWAW’02), Roma, Italy

    Google Scholar 

  • Mohr, G., Kimpton, M., Stack, M., & Ranitovic, I. (2004). Introduction to Heritrix, an archival quality web crawler. Paper presented at the 4th International Web Archiving Workshop (IWAW’04), Bath, UK

    Google Scholar 

  • Ntoulas, A., Zerfos, P., & Cho, J. (2005). Downloading textual hidden web content through keyword queries. Denver, CO, USA

    Google Scholar 

  • Raghavan, S. & Garcia-Molina, H. (2001). Crawling the Hidden Web. Paper presented at the Proceedings of the 27th International Conference on Very Large Data Bases

    Google Scholar 

  • Roche, X. (2006). Copying web sites. In J. Masanès (Ed.), Web Archiving. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Storey, M.-A. & Jahnke, J. H. (1999). Web site evolution - Towards a flexible integration of data and its representation. Paper presented at the 1st International Workshop on Web Site Evolution (WSE’99), Atlanta, USA

    Google Scholar 

  • Zhang, Z., He, B., & Chang, K. C.-C. (2004). Understanding Web query interfaces: Best-effort parsing with hidden syntax. Paper presented at the Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Masanés, J. (2006). Archiving the Hidden Web. In: Web Archiving. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-46332-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-46332-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23338-1

  • Online ISBN: 978-3-540-46332-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics