Advertisement

Characterizing Crawler Behavior from Web Server Access Logs

  • Marios Dikaiakos
  • Athena Stassopoulou
  • Loizos Papageorgiou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2738)

Abstract

In this paper, we present a study of crawler behavior based on Web-server access logs. To this end, we use logs from five different academic sites in three countries. Based on these logs, we analyze the activity of different crawlers that belong to five Search Engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic, and to general characterization studies based on Web-server access logs. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of WWW robots.

Keywords

Fast Fourier Transform Fast Fourier Transform Function Crawler Behavior General Characterization Study 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the Web. ACM Transactions on Internet Technology 1(1), 2–43 (2001)CrossRefGoogle Scholar
  2. 2.
    Arlitt, M., Jin, T.: Workload Characterization of the 1998 World Cup Web Site. Technical Report HPL-1999-35R1, Hewlett-Packard Laboratories (September 1999)Google Scholar
  3. 3.
    Barford, P., Bestavros, A., Bradley, A., Crovella, M.: Changes in Web client access patterns: Characteristics and caching implications. In: World Wide Web (special issue on Characterization and Performance Evaluation) (1999)Google Scholar
  4. 4.
    Crovella, M.: Performance Characteristics of the World-Wide Web. In: Haring, C.L.G., Reiser, M. (eds.) Performance Evaluation: Origins and Directions, pp. 219–233. Springer, Heidelberg (1999)Google Scholar
  5. 5.
    Dikaiakos, M., Stassopoulou, A., Papageorgiou, L.: Characterizing Crawler Behavior from Web Server Access Logs. Technical Report TR-2002-4, Department of Computer Science, University of Cyprus (November 2002)Google Scholar
  6. 6.
    Feldmann, A.: Characteristics of TCP Connection Arrivals. In: Park, K., Willinger, W. (eds.) Self-Similar Network Traffic and Performance Evaluation. John Wiley, Chichester (2000)Google Scholar
  7. 7.
    Krishnamurthy, B., Rexford, J.: Web Protocols and Practice. Addison-Wesley, Reading (2001)Google Scholar
  8. 8.
    Paliouras, G., Papatheodorou, C., Karkaletsis, V., Spyropoulos, C.: Clustering the Users of Large Web Sites into Communities. In: Proceedings of the International Conference on Maching Learning (ICML), pp. 719–726 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Marios Dikaiakos
    • 1
  • Athena Stassopoulou
    • 2
  • Loizos Papageorgiou
    • 1
  1. 1.Department of Computer ScienceUniversity of CyprusNicosiaCyprus
  2. 2.Dept. of Computer ScienceIntercollegeNicosiaCyprus

Personalised recommendations