Archiving the Hidden Web

Masanés, Julien

doi:10.1007/978-3-540-46332-0_5

Julien Masanés²

1278 Accesses
1 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, K. C. (2001). The Web as Database: New Extraction Technologies and Content Management. Online, March
Google Scholar
Agichtein, E., Ipeirotis, P. G., & Gravano, L. (2003). Modeling Query-Based Access to Text Databases
Google Scholar
Barbosa, L. & Freire, J. (2004). Siphoning Hidden-Web Data through KeywordBased Interfaces. Paper presented at the SBBD
Google Scholar
Bergman, M. I. K. (2001). The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing, 7(1)
Google Scholar
Boston, T. (2005). Exposing the deep web to increase access to library collections. Paper presented at the AusWeb05. The Twelfth Australasian World Wide Web Conference, Queensland, Australia
Google Scholar
Boufkhad, Y. & Viennot, L. (2003). The Observable Web. RR
Google Scholar
Boyko, A. (2004). Test Bed Taxonomy. IIPC Reports, 16
Google Scholar
Brandman, O., Cho, J., Garcia-Molina, H., & Shivakumar, N. (2000). CrawlerFriendly Web Servers. SIGMETRICS Performance Evaluation Review, 28(2), 9-14
Article Google Scholar
Callan, J. & Connell, M. (2001). Query-based sampling of text databases. ACM Transactions on Information Systems 19(2), 97-130
Article Google Scholar
Castillo, C. (2004). Effective Web Crawling. University of Chile
Google Scholar
Chang, K. C.-C., He, B., Li, C., Patel, M., & Zhang, Z. (2004). Structured databases on the web: observations and implications. SIGMOD Records, 33(3), 61-70
Article Google Scholar
Cope, J., Craswell, N., & Hawking, D. (2003). Automated discovery of search interfaces on the web. Paper presented at the Proceedings of the Fourteenth Australasian Database Conference on Database Technologies 2003
Google Scholar
Florescu, D., Levy, A., & Mendelzon, A. (1998). Database techniques for the World-Wide Web: A survey. SIGMOD Records, 27, 59-74
Article Google Scholar
Frankewitsch, T. & Prokosch, U. (2001). Navigation in medical Internet image databases. Medical Informatics and the Internet in Medicine, 26(1), 1-15
Google Scholar
Gravano, L., Ipeirotis, P. G., & Sahami, M. (2003). QProber: A System for Automatic Classification of Hidden-Web Databases. ACM Transactions on Information Systems, 21(1)
Google Scholar
He, H., Meng, W., Yu, C., & Wu, Z. (2005). WISE-Integrator: a system for extracting and integrating complex web search interfaces of the deep web. Trondheim, Norway
Google Scholar
Hearst, M. (1998). Information Integration. IEEE Intelligent Systems, 13(5), 12-24
Article Google Scholar
HTTrack. http://www.httrack.com/
Lage, J. P., Silva, A. S. D., Golgher, P. B., & Laender, A. H. F. (2002). Collecting hidden Web pages for data extraction. Paper presented at the Proceedings of the fourth international workshop on Web information and data management
Google Scholar
Lagoze, C. & Van de Sompel, H. (2001). The open archives initiative: building a low-barrier interoperability framework. Roanoke, Virginia, United States
Google Scholar
Lawrence, S. & Giles, C. L. (1999). Accessibility of Information on the Web. Nature, 400, 107-109
Article Google Scholar
Liddle, W. S., Yau, S. H., & Embley, D. W. (2002). On the Automatic Extraction of Data from the Hidden Web. Springer, Berlin Heidelberg New York
Google Scholar
Liu, X., Maly, K., Zubair, M., & Nelson, M. (2002). DP9 - an OAI gateway service for Web crawlers. Paper presented at the Second ACM/IEEE Joint Conference on Digital Libraries
Google Scholar
Ludäscher, B. & Gupta, A. (1999). Modeling Interactive Web Sources for Information Mediation. Paper presented at the Intl. Workshop on the World-Wide Web and Conceptual Modeling (WWWCM’99), Paris
Google Scholar
Marill, J., Boyko, A., & Ashenfelder, M. (2004). Web Harvesting Survey, 10
Google Scholar
Masanès, J. (2002). Archiving the deep web. Paper presented at the 2nd International Workshop on Web Archives (IWAW’02), Roma, Italy
Google Scholar
Mohr, G., Kimpton, M., Stack, M., & Ranitovic, I. (2004). Introduction to Heritrix, an archival quality web crawler. Paper presented at the 4th International Web Archiving Workshop (IWAW’04), Bath, UK
Google Scholar
Ntoulas, A., Zerfos, P., & Cho, J. (2005). Downloading textual hidden web content through keyword queries. Denver, CO, USA
Google Scholar
Raghavan, S. & Garcia-Molina, H. (2001). Crawling the Hidden Web. Paper presented at the Proceedings of the 27th International Conference on Very Large Data Bases
Google Scholar
Roche, X. (2006). Copying web sites. In J. Masanès (Ed.), Web Archiving. Springer, Berlin Heidelberg New York
Google Scholar
Storey, M.-A. & Jahnke, J. H. (1999). Web site evolution - Towards a flexible integration of data and its representation. Paper presented at the 1st International Workshop on Web Site Evolution (WSE’99), Atlanta, USA
Google Scholar
Zhang, Z., He, B., & Chang, K. C.-C. (2004). Understanding Web query interfaces: Best-effort parsing with hidden syntax. Paper presented at the Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data
Google Scholar

Download references

Author information

Authors and Affiliations

European Web Archive, 25 Rue des Envierges, 75020, Paris, France
Julien Masanés

Authors

Julien Masanés
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Masanés, J. (2006). Archiving the Hidden Web. In: Web Archiving. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-46332-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-46332-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23338-1
Online ISBN: 978-3-540-46332-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics