Structured Data on the Web

Halevy, Alon Y.

doi:10.1007/978-3-642-04941-5_2

Alon Y. Halevy¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5831))

Included in the following conference series:

International Conference on Next Generation Information Technologies and Systems

472 Accesses
1 Citations

Abstract

Though search on the World-Wide Web has focused mostly on unstructured text, there is an increasing amount of structured data on the Web and growing interest in harnessing such data. I will describe several current projects at Google whose overall goal is to leverage structured data and better expose it to our users.

The first project is on crawling the deep web. The deep web refers to content that resides in databases behind forms, but is unreachable by search engines because there are no links to these pages. I will describe a system that surfaces pages from the deep web by guessing queries to submit to these forms, and entering the results into the Google index [1]. The pages that we generated using this system come from millions of forms, hundreds of domains and over 40 languages. Pages from the deep web are served in the top-10 results on google.com for over 1000 queries per second.

The second project considers the collection of HTML tables on the web. The WebTables Project [2] built a corpus of over 150 million tables from HTML tables on the Web. The WebTables System addresses the challenges of extracting these tables from the Web, and offers search over this collection of tables. The project also illustrates the potential of leveraging the collection of schemas of these tables.

Finally, I’ll discuss current work on computing aspects of queries in order to better organize search results for exploratory queries.

Download to read the full chapter text

Chapter PDF

Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future Prospects

Deep Web crawling: a survey

Article 05 June 2018

Current Challenges in Web Crawling

Keywords

References

Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s deep-web crawl. In: Proc. of VLDB, pp. 1241–1252 (2008)
Google Scholar
Cafarella, M.J., Halevy, A., Zhang, Y., Wang, D.Z., Wu, E.: WebTables: Exploring the Power of Tables on the Web. In: VLDB (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc., 1600 Amphitheatre Parkway, Mountain View, California, 94043, USA
Alon Y. Halevy

Authors

Alon Y. Halevy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM Haifa Research Lab, Haifa University Campus, Mount Carmel, 31905, Haifa, Israel
Yishai A. Feldman
Department of Computer Science, U.S. Air Force Academy, 2354 Fairchild Drive, Suite 6G-101, 80840, Colorado Springs, Colorado, United States
Donald Kraft
Management Information Systems Department, The University of Haifa, Mount Carmel, 31905, Haifa, Israel
Tsvi Kuflik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halevy, A.Y. (2009). Structured Data on the Web. In: Feldman, Y.A., Kraft, D., Kuflik, T. (eds) Next Generation Information Technologies and Systems. NGITS 2009. Lecture Notes in Computer Science, vol 5831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04941-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-04941-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04940-8
Online ISBN: 978-3-642-04941-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Structured Data on the Web

Abstract

Chapter PDF

Similar content being viewed by others

Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future Prospects

Deep Web crawling: a survey

Current Challenges in Web Crawling

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Structured Data on the Web

Abstract

Chapter PDF

Similar content being viewed by others

Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future Prospects

Deep Web crawling: a survey

Current Challenges in Web Crawling

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation