Abstract
This chapter overviews Google web-search, one of the most ubiquitous and influential information systems available today: it stores and processes huge volumes of diverse information, scales to cope with huge numbers of users, yet produces high-quality results for each search query they type.
Aside from being so useful on a day-to-day basis, and easy to grasp, it represents an excellent example because the techniques used are based on fairly introductory Mathematics. Moving step-by-step through the application of basic graph and probability theory, the chapter acts as an introduction to both the data structures and algorithms that underpin the system as a whole.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The chapter assumes you have at least some exposure to these topics, and focuses on explaining how they are used. However, we include a number of fairly lengthy introductions in case you need a refresher or even a place to start learning about them from scratch.
- 2.
A copy of the video is preserved at http://www.dougengelbart.org/firsts/dougs-1968-demo.html.
- 3.
A copy of the post is preserved at http://groups.google.com/groups?selm=6487%40cernvax.cern.ch.
- 4.
A copy of the web-page is preserved at http://www.w3.org/History/19921103-hypertext/hypertext/DataSources/WWW/Servers.html.
- 5.
Why search? This terminology relates to how BFS and DFS are often used, namely to search for a target vertex within the graph: once the target vertex v is visited, the traversal usually stops rather than continuing to visit all vertices.
- 6.
This also explains why in various descriptions, including the original research paper, p is termed a damping factor (or damping ratio); this term stems from description of a similar feature of physical systems [6].
References
Wikipedia: AdSense. https://en.wikipedia.org/wiki/AdSense
Wikipedia: AdWords. http://en.wikipedia.org/wiki/AdWords
Wikipedia: Breadth-first search. https://en.wikipedia.org/wiki/Breadth-first_search
Wikipedia: Conditional probability. http://en.wikipedia.org/wiki/Conditional_probability
Wikipedia: Cycle. http://en.wikipedia.org/wiki/Cycle_(graph_theory)
Wikipedia: Damping ratio. http://en.wikipedia.org/wiki/Damping_ratio
Wikipedia: Depth-first search. http://en.wikipedia.org/wiki/Depth-first_search
Wikipedia: Graph. http://en.wikipedia.org/wiki/Graph_(mathematics)
Wikipedia: Graph theory. http://en.wikipedia.org/wiki/Graph_theory
Wikipedia: Graph traversal. http://en.wikipedia.org/wiki/Graph_traversal
Wikipedia: Hyperlink. http://en.wikipedia.org/wiki/Hyperlink
Wikipedia: Hyperlink-Induced Topic Search (HITS). http://en.wikipedia.org/wiki/HITS_algorithm
Wikipedia: Hypertext. http://en.wikipedia.org/wiki/Hypertext
Wikipedia: HyperText Mark-up Language (HTML). http://en.wikipedia.org/wiki/HTML
Wikipedia: HyperText Transfer Protocol (HTTP). http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
Wikipedia: Impact factor. http://en.wikipedia.org/wiki/Impact_factor
Wikipedia: Information retrieval. http://en.wikipedia.org/wiki/Information_retrieval
Wikipedia: Iterative method. http://en.wikipedia.org/wiki/Iterative_method
Wikipedia: Mark-up language. http://en.wikipedia.org/wiki/Markup_language
Wikipedia: Meta element. http://en.wikipedia.org/wiki/Meta_element
Wikipedia: Newton-Raphson method. https://en.wikipedia.org/wiki/Newton’s_method
Wikipedia: On-Line System (NLS). http://en.wikipedia.org/wiki/NLS_(computer_system)
Wikipedia: Online and offline. http://en.wikipedia.org/wiki/Online_and_offline
Wikipedia: Open Directory Project (ODP). http://en.wikipedia.org/wiki/Open_Directory_Project
Wikipedia: PageRank. http://en.wikipedia.org/wiki/PageRank
Wikipedia: Pre-computation. http://en.wikipedia.org/wiki/Precomputation
Wikipedia: Precision. http://en.wikipedia.org/wiki/Precision_(computer_science)
Wikipedia: Random walk. http://en.wikipedia.org/wiki/Random_walk
Wikipedia: Robots exclusion standard. http://en.wikipedia.org/wiki/Robots_Exclusion_Standard
Wikipedia: Search engine optimization. http://en.wikipedia.org/wiki/Search_engine_optimization
Wikipedia: System of linear equations. https://en.wikipedia.org/wiki/System_of_linear_equations
Wikipedia: The mother of all demos. http://en.wikipedia.org/wiki/The_Mother_of_All_Demos
Wikipedia: Top Level Domain (TLD). http://en.wikipedia.org/wiki/Top-level_domain
Wikipedia: Tree. https://en.wikipedia.org/wiki/Tree_(graph_theory)
Wikipedia: Web-crawler. http://en.wikipedia.org/wiki/Web_crawler
Wikipedia: Web-directory. http://en.wikipedia.org/wiki/Web_directory
Wikipedia: Web-graph. http://en.wikipedia.org/wiki/Webgraph
Wikipedia: Web indexing. http://en.wikipedia.org/wiki/Web_indexing
Wikipedia: Web-search engine. http://en.wikipedia.org/wiki/Web_search_engine
Wikipedia: Web-search query. http://en.wikipedia.org/wiki/Web_search_query
Wikipedia: World Wide Web. http://en.wikipedia.org/wiki/World_Wide_Web
Wikipedia: World Wide Web Virtual Library (WWWVL). http://en.wikipedia.org/wiki/World_Wide_Web_Virtual_Library
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Page, D., Smart, N. (2014). Demystifying Web-Search: the Mathematics of PageRank. In: What Is Computer Science?. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-04042-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-04042-4_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04041-7
Online ISBN: 978-3-319-04042-4
eBook Packages: Computer ScienceComputer Science (R0)