Advertisement

Analyzing and Visualizing Web Server Access Log File

  • Minh-Tri NguyenEmail author
  • Thanh-Dang Diep
  • Tran Hoang Vinh
  • Takuma Nakajima
  • Nam Thoai
Conference paper
  • 752 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11251)

Abstract

Websites have endlessly multiplied during the recent decades and the number of visitors to the websites keeps the pace with them simultaneously, which leads to the process of huge data creation. The data are believed to consist of hidden knowledge well worth considering in various activities related to e-Business, e-CRM, e-Services, e-Newspapers, e-Government, Digital Libraries, and so on. In order to extract knowledge from the web data efficiently, a process called web usage mining is applied to such data. In this literature, we use the process to uncover interesting patterns in web server access log file gathered from Ho Chi Minh City University of Technology (HCMUT) in Vietnam. Moreover, we propose a novel model to construct and add new attributes encompassing country, province (or city), Internet Service Provider (ISP) from the existing attribute IP. The model belongs to attribute construction (or feature construction) which is one of strategies of data transformation being a data pre-processing technique. By utilizing the aforementioned mining process, we have wide knowledge about user access patterns for every country, province and ISP. Such knowledge can be leveraged for optimizing system performance as well as enhancing personalization. Furthermore, the valuable knowledge can be useful for deciding reasonable caching policies for web proxies.

Keywords

Web usage mining Data transformation Server log file Access log Extended common log file format 

Notes

Acknowledgements

This research was conducted within the project of Studying collaborative caching algorithms in content delivery network sponsored by TIS (IT Holding Group).

References

  1. 1.
    Apache Spark. https://spark.apache.org. Accessed 10 July 2018
  2. 2.
    DomainTools. http://research.domaintools.com/statistics/tld-counts. Accessed 10 July 2018
  3. 3.
    Ho Chi Minh City University of Technology, Vietnam. http://hcmut.edu.vn. Accessed 10 July 2018
  4. 4.
    Internet World Stats. https://www.internetworldstats.com/stats.htm. Accessed 10 July 2018
  5. 5.
    Tableau. https://www.tableau.com. Accessed 10 July 2018
  6. 6.
    Agosti, M., Crivellari, F., Di Nunzio, G.M.: Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction. Data Min. Knowl. Discov. 24(3), 663–696 (2012)CrossRefGoogle Scholar
  7. 7.
    Barsagade, N.: Web usage mining and pattern discovery: a survey paper. Computer Science and Engineering Department, CSE Technical report 8331 (2003)Google Scholar
  8. 8.
    Gündüz, Ş., Özsu, M.T.: A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–540. ACM (2003)Google Scholar
  9. 9.
    Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)zbMATHGoogle Scholar
  10. 10.
    Hussain, T., Asghar, S., Masood, N.: Web usage mining: a survey on preprocessing of web log file. In: 2010 International Conference on Information and Emerging Technologies (ICIET), pp. 1–6. IEEE (2010)Google Scholar
  11. 11.
    Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. (TOIS) 25(3), 14 (2007)CrossRefGoogle Scholar
  12. 12.
    Kosala, R., Blockeel, H.: Web mining research: a survey. ACM SIGKDD Explor. Newsl. 2(1), 1–15 (2000)CrossRefGoogle Scholar
  13. 13.
    Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-37882-2CrossRefzbMATHGoogle Scholar
  14. 14.
    Murata, T., Saito, K.: Extracting users’ interests from web log data. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 343–346. IEEE Computer Society (2006)Google Scholar
  15. 15.
    Nakajima, T., Yoshimi, M., Wu, C., Yoshinaga, T.: A light-weight content distribution scheme for cooperative caching in telco-CDNs. In: 2016 Fourth International Symposium on Computing and Networking (CANDAR), pp. 126–132. IEEE (2016)Google Scholar
  16. 16.
    Nakajima, T., Yoshimi, M., Wu, C., Yoshinaga, T.: Color-based cooperative cache and its routing scheme for Telco-CDNs. IEICE Trans. Inf. Syst. 100(12), 2847–2856 (2017)CrossRefGoogle Scholar
  17. 17.
    Pabarskaite, Z.: Implementing advanced cleaning and end-user interpretability technologies in web log mining. In: Proceedings of the 24th International Conference on Information Technology Interfaces, ITI 2002, pp. 109–113. IEEE (2002)Google Scholar
  18. 18.
    Pani, S.K., Panigrahy, L., Sankar, V., Ratha, B.K., Mandal, A., Padhi, S.: Web usage mining: a survey on pattern extraction from web logs. Int. J. Instrum. Control Autom. 1(1), 15–23 (2011)Google Scholar
  19. 19.
    Perkowitz, M., Etzioni, O.: Adaptive web sites: automatically synthesizing web pages. In: AAAI/IAAI, pp. 727–732 (1998)Google Scholar
  20. 20.
    Piatetsky-Shapiro, G., Fayyad, U., Smith, P.: From data mining to knowledge discovery: an overview. Adv. Knowl. Discov. Data Min. 1, 35 (1996)Google Scholar
  21. 21.
    Spiliopoulou, M., Faulstich, L.C.: WUM: a web utilization miner. In: International Workshop on the Web and Databases, Valencia, Spain. Citeseer (1998)Google Scholar
  22. 22.
    Spiliopoulou, M., Faulstich, L.C., Winkler, K.: A data miner analyzing the navigational behaviour of web users. In: Proceedings of the Workshop on Machine Learning in User Modelling of the ACAI 1999, Greece, July 1999Google Scholar
  23. 23.
    Suneetha, K., Krishnamoorthi, R.: Identifying user behavior by analyzing web server access log file. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(4), 327–332 (2009)Google Scholar
  24. 24.
    Wahab, M.H.A., Mohd, M.N.H., Hanafi, H.F., Mohsin, M.F.M.: Data pre-processing on web server logs for generalized association rules mining algorithm. World Acad. Sci. Eng. Technol. 48, 2008 (2008)Google Scholar
  25. 25.
    Yun, L., Xun, W., Huamao, G.: A hybrid information filtering algorithm based on distributed web log mining. In: Third International Conference on Convergence and Hybrid Information Technology, ICCIT 2008, vol. 1, pp. 1086–1091. IEEE (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Minh-Tri Nguyen
    • 1
    Email author
  • Thanh-Dang Diep
    • 1
  • Tran Hoang Vinh
    • 1
  • Takuma Nakajima
    • 2
  • Nam Thoai
    • 1
  1. 1.Faculty of Computer Science and EngineeringHo Chi Minh City University of Technology, VNUHCMHo Chi Minh CityVietnam
  2. 2.Graduate School of Information SystemsThe University of Electro-CommunicationsChofu-shiJapan

Personalised recommendations