Skip to main content

Analyzing and Visualizing Web Server Access Log File

  • Conference paper
  • First Online:
Future Data and Security Engineering (FDSE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11251))

Included in the following conference series:

Abstract

Websites have endlessly multiplied during the recent decades and the number of visitors to the websites keeps the pace with them simultaneously, which leads to the process of huge data creation. The data are believed to consist of hidden knowledge well worth considering in various activities related to e-Business, e-CRM, e-Services, e-Newspapers, e-Government, Digital Libraries, and so on. In order to extract knowledge from the web data efficiently, a process called web usage mining is applied to such data. In this literature, we use the process to uncover interesting patterns in web server access log file gathered from Ho Chi Minh City University of Technology (HCMUT) in Vietnam. Moreover, we propose a novel model to construct and add new attributes encompassing country, province (or city), Internet Service Provider (ISP) from the existing attribute IP. The model belongs to attribute construction (or feature construction) which is one of strategies of data transformation being a data pre-processing technique. By utilizing the aforementioned mining process, we have wide knowledge about user access patterns for every country, province and ISP. Such knowledge can be leveraged for optimizing system performance as well as enhancing personalization. Furthermore, the valuable knowledge can be useful for deciding reasonable caching policies for web proxies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Spark. https://spark.apache.org. Accessed 10 July 2018

  2. DomainTools. http://research.domaintools.com/statistics/tld-counts. Accessed 10 July 2018

  3. Ho Chi Minh City University of Technology, Vietnam. http://hcmut.edu.vn. Accessed 10 July 2018

  4. Internet World Stats. https://www.internetworldstats.com/stats.htm. Accessed 10 July 2018

  5. Tableau. https://www.tableau.com. Accessed 10 July 2018

  6. Agosti, M., Crivellari, F., Di Nunzio, G.M.: Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction. Data Min. Knowl. Discov. 24(3), 663–696 (2012)

    Article  Google Scholar 

  7. Barsagade, N.: Web usage mining and pattern discovery: a survey paper. Computer Science and Engineering Department, CSE Technical report 8331 (2003)

    Google Scholar 

  8. Gündüz, Ş., Özsu, M.T.: A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–540. ACM (2003)

    Google Scholar 

  9. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  10. Hussain, T., Asghar, S., Masood, N.: Web usage mining: a survey on preprocessing of web log file. In: 2010 International Conference on Information and Emerging Technologies (ICIET), pp. 1–6. IEEE (2010)

    Google Scholar 

  11. Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. (TOIS) 25(3), 14 (2007)

    Article  Google Scholar 

  12. Kosala, R., Blockeel, H.: Web mining research: a survey. ACM SIGKDD Explor. Newsl. 2(1), 1–15 (2000)

    Article  Google Scholar 

  13. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-37882-2

    Book  MATH  Google Scholar 

  14. Murata, T., Saito, K.: Extracting users’ interests from web log data. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 343–346. IEEE Computer Society (2006)

    Google Scholar 

  15. Nakajima, T., Yoshimi, M., Wu, C., Yoshinaga, T.: A light-weight content distribution scheme for cooperative caching in telco-CDNs. In: 2016 Fourth International Symposium on Computing and Networking (CANDAR), pp. 126–132. IEEE (2016)

    Google Scholar 

  16. Nakajima, T., Yoshimi, M., Wu, C., Yoshinaga, T.: Color-based cooperative cache and its routing scheme for Telco-CDNs. IEICE Trans. Inf. Syst. 100(12), 2847–2856 (2017)

    Article  Google Scholar 

  17. Pabarskaite, Z.: Implementing advanced cleaning and end-user interpretability technologies in web log mining. In: Proceedings of the 24th International Conference on Information Technology Interfaces, ITI 2002, pp. 109–113. IEEE (2002)

    Google Scholar 

  18. Pani, S.K., Panigrahy, L., Sankar, V., Ratha, B.K., Mandal, A., Padhi, S.: Web usage mining: a survey on pattern extraction from web logs. Int. J. Instrum. Control Autom. 1(1), 15–23 (2011)

    Google Scholar 

  19. Perkowitz, M., Etzioni, O.: Adaptive web sites: automatically synthesizing web pages. In: AAAI/IAAI, pp. 727–732 (1998)

    Google Scholar 

  20. Piatetsky-Shapiro, G., Fayyad, U., Smith, P.: From data mining to knowledge discovery: an overview. Adv. Knowl. Discov. Data Min. 1, 35 (1996)

    Google Scholar 

  21. Spiliopoulou, M., Faulstich, L.C.: WUM: a web utilization miner. In: International Workshop on the Web and Databases, Valencia, Spain. Citeseer (1998)

    Google Scholar 

  22. Spiliopoulou, M., Faulstich, L.C., Winkler, K.: A data miner analyzing the navigational behaviour of web users. In: Proceedings of the Workshop on Machine Learning in User Modelling of the ACAI 1999, Greece, July 1999

    Google Scholar 

  23. Suneetha, K., Krishnamoorthi, R.: Identifying user behavior by analyzing web server access log file. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(4), 327–332 (2009)

    Google Scholar 

  24. Wahab, M.H.A., Mohd, M.N.H., Hanafi, H.F., Mohsin, M.F.M.: Data pre-processing on web server logs for generalized association rules mining algorithm. World Acad. Sci. Eng. Technol. 48, 2008 (2008)

    Google Scholar 

  25. Yun, L., Xun, W., Huamao, G.: A hybrid information filtering algorithm based on distributed web log mining. In: Third International Conference on Convergence and Hybrid Information Technology, ICCIT 2008, vol. 1, pp. 1086–1091. IEEE (2008)

    Google Scholar 

Download references

Acknowledgements

This research was conducted within the project of Studying collaborative caching algorithms in content delivery network sponsored by TIS (IT Holding Group).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh-Tri Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, MT., Diep, TD., Hoang Vinh, T., Nakajima, T., Thoai, N. (2018). Analyzing and Visualizing Web Server Access Log File. In: Dang, T., Küng, J., Wagner, R., Thoai, N., Takizawa, M. (eds) Future Data and Security Engineering. FDSE 2018. Lecture Notes in Computer Science(), vol 11251. Springer, Cham. https://doi.org/10.1007/978-3-030-03192-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03192-3_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03191-6

  • Online ISBN: 978-3-030-03192-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics