The Effects of Web Logs and the Semantic Web on Autonomous Web Agents

  • Michael P. Evans
  • Richard Newman
  • Timothy A. Millea
  • Timothy Putnam
  • Andrew Walker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3280)

Abstract

Search engines exploit the Web’s hyperlink structure to help infer information content. The new phenomenon of personal Web logs, or ‘blogs’, encourage more extensive annotation of Web content. If their resulting link structures bias the Web crawling applications that search engines depend upon, there are implications for another form of annotation rapidly on the rise, the Semantic Web. We conducted a Web crawl of 160 000 pages in which the link structure of the Web is compared with that of several thousand blogs. Results show that the two link structures are significantly different. We analyse the differences and infer the likely effect upon the performance of existing and future Web agents. The Semantic Web offers new opportunities to navigate the Web, but Web agents should be designed to take advantage of the emerging link structures, or their effectiveness will diminish.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brewington, B.E., Cybenko, G.: How dynamic is the Web? In: Proc. 9th Int. World Wide Web Conf. on Computer Networks: The Int. Journal of Computer and Telecommunications Networking, pp. 257–276. North-Holland Publishing, Amsterdam (2000)Google Scholar
  2. 2.
    Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web. John Wiley and Sons, England (2003)Google Scholar
  3. 3.
    Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in theWeb. In: Proc. 9th Int.World Wide Web Conf. on Computer Networks: The Int. Journal of Computer and Telecommunications Networking, pp. 309–320. North-Holland Publishing, Amsterdam (2000)Google Scholar
  4. 4.
    Albert, R., Jeong, H., Barabási, A.L.: The diameter of the World Wide Web. Nature 401 (1999)Google Scholar
  5. 5.
    Huberman, B.A., Adamic, L.A.: Growth dynamics of theWorldWideWeb. Nature 401, 131 (1999)Google Scholar
  6. 6.
    Adamic, L.A., Lukose, R.M., Puniyani, A.R., Huberman, B.A.: Search in powerlaw networks. Physical Rev. E 64 (2001)Google Scholar
  7. 7.
    Kleinberg, J.M., Lawrence, S.: The structure of the Web. Science 294, 1849–1850 (2001)CrossRefGoogle Scholar
  8. 8.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 668–677. ACM Press, New York (1998)Google Scholar
  9. 9.
    Chakrabarti, S., Joshi, M.M., Punera, K., Pennock, D.M.: The structure of broad topics on the Web. In: Proceedings of the 11th International World Wide Web Conference, pp. 251–262. ACM Press, New York (2002)CrossRefGoogle Scholar
  10. 10.
    Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification ofWeb communities. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160. ACM Press, New York (2000)CrossRefGoogle Scholar
  11. 11.
    Kleinberg, J.M.: Hubs, authorities, and communities. ACM Computing Surveys 31(5) (1999)Google Scholar
  12. 12.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)CrossRefGoogle Scholar
  13. 13.
    NITLE: National Institute for Technology and Liberal Education Weblog Census (2004), http://www.blogcensus.net
  14. 14.
    Henning, J.: The blogging iceberg. Technical report (2004), http://www.perseus.com/blogsurvey/thebloggingiceberg.html
  15. 15.
    Berners-Lee, T.: Information management: A proposal (1989), http://www.w3.org/History/1989/proposal.html
  16. 16.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 34–43 (2001)CrossRefGoogle Scholar
  17. 17.
    W3C Recommendation: RDF concepts and abstract syntax (2004), http://www.w3.org/TR/rdf-concepts
  18. 18.
    Antoniou, G., van Harmelen, F.: Web Ontology Language: OWL. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies in Information Systems, Springer, Heidelberg (2003)Google Scholar
  19. 19.
    Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: From SHIQ and RDF to OWL: The making of a Web Ontology Language. Web Semantics 1, 7–26 (2003)Google Scholar
  20. 20.
    Berners-Lee, T., Connolly, D., Palmer, S., Nottingham, M.: cwm — A generalpurpose data processor for the semantic web (2004), http://www.w3.org/2000/10/swap/doc/cwm
  21. 21.
    Haarslev, V., Möller, R.: Racer: A core inference engine for the Semantic Web. In: Sure, Y., Corcho, O. (eds.) Proceedings of the 2nd International Workshop on Evaluation of Ontology-based Tools, Montreal, Canada. CEUR Workshop Proceedings, vol. 87 (2003)Google Scholar
  22. 22.
    Fikes, R., Hayes, P., Horrocks, I.: OWL-QL — A language for deductive query answering on the Semantic Web. Technical Report, Knowledge Systems Laboratory, Stanford University, Stanford, CA, 94305–9020, USA (2003)Google Scholar
  23. 23.
    Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., Weitz, D.: Approximating aggregate queries about Web pages via random walks. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 535–544. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  24. 24.
    Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform URL sampling, pp. 295–308 (2000)Google Scholar
  25. 25.
    Barabási, A.L., Bonabeau, E.: Scale-free networks. Scientific American 288 (2003)Google Scholar
  26. 26.
    Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Upfal, E.: The Web as a graph. In: Proc. 19th ACM SIGACT-SIGMOD-AIGART Symposium on Principles of Database Systems (PODS), pp. 1–10. ACM Press, New York (2000)Google Scholar
  27. 27.
    Starr, S.: Google hogged by blogs (2003), http://www.spiked-online.co.uk/Articles/00000006DE60.htm
  28. 28.
    WWW Consortium: Annotea project, http://www.w3.org/2001/Annotea

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Michael P. Evans
    • 1
  • Richard Newman
    • 1
  • Timothy A. Millea
    • 1
  • Timothy Putnam
    • 1
  • Andrew Walker
    • 2
  1. 1.Applied Software Engineering Group, School of Systems EngineeringThe University of ReadingReadingUK
  2. 2.School of MathematicsKingston UniversityKingston-upon-Thames, SurreyUK

Personalised recommendations