Skip to main content

Demystifying Web-Search: the Mathematics of PageRank

  • Chapter
What Is Computer Science?

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

  • 3284 Accesses

Abstract

This chapter overviews Google web-search, one of the most ubiquitous and influential information systems available today: it stores and processes huge volumes of diverse information, scales to cope with huge numbers of users, yet produces high-quality results for each search query they type.

Aside from being so useful on a day-to-day basis, and easy to grasp, it represents an excellent example because the techniques used are based on fairly introductory Mathematics. Moving step-by-step through the application of basic graph and probability theory, the chapter acts as an introduction to both the data structures and algorithms that underpin the system as a whole.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The chapter assumes you have at least some exposure to these topics, and focuses on explaining how they are used. However, we include a number of fairly lengthy introductions in case you need a refresher or even a place to start learning about them from scratch.

  2. 2.

    A copy of the video is preserved at http://www.dougengelbart.org/firsts/dougs-1968-demo.html.

  3. 3.

    A copy of the post is preserved at http://groups.google.com/groups?selm=6487%40cernvax.cern.ch.

  4. 4.

    A copy of the web-page is preserved at http://www.w3.org/History/19921103-hypertext/hypertext/DataSources/WWW/Servers.html.

  5. 5.

    Why search? This terminology relates to how BFS and DFS are often used, namely to search for a target vertex within the graph: once the target vertex v is visited, the traversal usually stops rather than continuing to visit all vertices.

  6. 6.

    This also explains why in various descriptions, including the original research paper, p is termed a damping factor (or damping ratio); this term stems from description of a similar feature of physical systems [6].

References

  1. Wikipedia: AdSense. https://en.wikipedia.org/wiki/AdSense

  2. Wikipedia: AdWords. http://en.wikipedia.org/wiki/AdWords

  3. Wikipedia: Breadth-first search. https://en.wikipedia.org/wiki/Breadth-first_search

  4. Wikipedia: Conditional probability. http://en.wikipedia.org/wiki/Conditional_probability

  5. Wikipedia: Cycle. http://en.wikipedia.org/wiki/Cycle_(graph_theory)

  6. Wikipedia: Damping ratio. http://en.wikipedia.org/wiki/Damping_ratio

  7. Wikipedia: Depth-first search. http://en.wikipedia.org/wiki/Depth-first_search

  8. Wikipedia: Graph. http://en.wikipedia.org/wiki/Graph_(mathematics)

  9. Wikipedia: Graph theory. http://en.wikipedia.org/wiki/Graph_theory

  10. Wikipedia: Graph traversal. http://en.wikipedia.org/wiki/Graph_traversal

  11. Wikipedia: Hyperlink. http://en.wikipedia.org/wiki/Hyperlink

  12. Wikipedia: Hyperlink-Induced Topic Search (HITS). http://en.wikipedia.org/wiki/HITS_algorithm

  13. Wikipedia: Hypertext. http://en.wikipedia.org/wiki/Hypertext

  14. Wikipedia: HyperText Mark-up Language (HTML). http://en.wikipedia.org/wiki/HTML

  15. Wikipedia: HyperText Transfer Protocol (HTTP). http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

  16. Wikipedia: Impact factor. http://en.wikipedia.org/wiki/Impact_factor

  17. Wikipedia: Information retrieval. http://en.wikipedia.org/wiki/Information_retrieval

  18. Wikipedia: Iterative method. http://en.wikipedia.org/wiki/Iterative_method

  19. Wikipedia: Mark-up language. http://en.wikipedia.org/wiki/Markup_language

  20. Wikipedia: Meta element. http://en.wikipedia.org/wiki/Meta_element

  21. Wikipedia: Newton-Raphson method. https://en.wikipedia.org/wiki/Newton’s_method

  22. Wikipedia: On-Line System (NLS). http://en.wikipedia.org/wiki/NLS_(computer_system)

  23. Wikipedia: Online and offline. http://en.wikipedia.org/wiki/Online_and_offline

  24. Wikipedia: Open Directory Project (ODP). http://en.wikipedia.org/wiki/Open_Directory_Project

  25. Wikipedia: PageRank. http://en.wikipedia.org/wiki/PageRank

  26. Wikipedia: Pre-computation. http://en.wikipedia.org/wiki/Precomputation

  27. Wikipedia: Precision. http://en.wikipedia.org/wiki/Precision_(computer_science)

  28. Wikipedia: Random walk. http://en.wikipedia.org/wiki/Random_walk

  29. Wikipedia: Robots exclusion standard. http://en.wikipedia.org/wiki/Robots_Exclusion_Standard

  30. Wikipedia: Search engine optimization. http://en.wikipedia.org/wiki/Search_engine_optimization

  31. Wikipedia: System of linear equations. https://en.wikipedia.org/wiki/System_of_linear_equations

  32. Wikipedia: The mother of all demos. http://en.wikipedia.org/wiki/The_Mother_of_All_Demos

  33. Wikipedia: Top Level Domain (TLD). http://en.wikipedia.org/wiki/Top-level_domain

  34. Wikipedia: Tree. https://en.wikipedia.org/wiki/Tree_(graph_theory)

  35. Wikipedia: Web-crawler. http://en.wikipedia.org/wiki/Web_crawler

  36. Wikipedia: Web-directory. http://en.wikipedia.org/wiki/Web_directory

  37. Wikipedia: Web-graph. http://en.wikipedia.org/wiki/Webgraph

  38. Wikipedia: Web indexing. http://en.wikipedia.org/wiki/Web_indexing

  39. Wikipedia: Web-search engine. http://en.wikipedia.org/wiki/Web_search_engine

  40. Wikipedia: Web-search query. http://en.wikipedia.org/wiki/Web_search_query

  41. Wikipedia: World Wide Web. http://en.wikipedia.org/wiki/World_Wide_Web

  42. Wikipedia: World Wide Web Virtual Library (WWWVL). http://en.wikipedia.org/wiki/World_Wide_Web_Virtual_Library

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Page, D., Smart, N. (2014). Demystifying Web-Search: the Mathematics of PageRank. In: What Is Computer Science?. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-04042-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04042-4_5

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04041-7

  • Online ISBN: 978-3-319-04042-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics