Abstract
Websites have endlessly multiplied during the recent decades and the number of visitors to the websites keeps the pace with them simultaneously, which leads to the process of huge data creation. The data are believed to consist of hidden knowledge well worth considering in various activities related to e-Business, e-CRM, e-Services, e-Newspapers, e-Government, Digital Libraries, and so on. In order to extract knowledge from the web data efficiently, a process called web usage mining is applied to such data. In this literature, we use the process to uncover interesting patterns in web server access log file gathered from Ho Chi Minh City University of Technology (HCMUT) in Vietnam. Moreover, we propose a novel model to construct and add new attributes encompassing country, province (or city), Internet Service Provider (ISP) from the existing attribute IP. The model belongs to attribute construction (or feature construction) which is one of strategies of data transformation being a data pre-processing technique. By utilizing the aforementioned mining process, we have wide knowledge about user access patterns for every country, province and ISP. Such knowledge can be leveraged for optimizing system performance as well as enhancing personalization. Furthermore, the valuable knowledge can be useful for deciding reasonable caching policies for web proxies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache Spark. https://spark.apache.org. Accessed 10 July 2018
DomainTools. http://research.domaintools.com/statistics/tld-counts. Accessed 10 July 2018
Ho Chi Minh City University of Technology, Vietnam. http://hcmut.edu.vn. Accessed 10 July 2018
Internet World Stats. https://www.internetworldstats.com/stats.htm. Accessed 10 July 2018
Tableau. https://www.tableau.com. Accessed 10 July 2018
Agosti, M., Crivellari, F., Di Nunzio, G.M.: Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction. Data Min. Knowl. Discov. 24(3), 663–696 (2012)
Barsagade, N.: Web usage mining and pattern discovery: a survey paper. Computer Science and Engineering Department, CSE Technical report 8331 (2003)
Gündüz, Ş., Özsu, M.T.: A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–540. ACM (2003)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Hussain, T., Asghar, S., Masood, N.: Web usage mining: a survey on preprocessing of web log file. In: 2010 International Conference on Information and Emerging Technologies (ICIET), pp. 1–6. IEEE (2010)
Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. (TOIS) 25(3), 14 (2007)
Kosala, R., Blockeel, H.: Web mining research: a survey. ACM SIGKDD Explor. Newsl. 2(1), 1–15 (2000)
Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-37882-2
Murata, T., Saito, K.: Extracting users’ interests from web log data. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 343–346. IEEE Computer Society (2006)
Nakajima, T., Yoshimi, M., Wu, C., Yoshinaga, T.: A light-weight content distribution scheme for cooperative caching in telco-CDNs. In: 2016 Fourth International Symposium on Computing and Networking (CANDAR), pp. 126–132. IEEE (2016)
Nakajima, T., Yoshimi, M., Wu, C., Yoshinaga, T.: Color-based cooperative cache and its routing scheme for Telco-CDNs. IEICE Trans. Inf. Syst. 100(12), 2847–2856 (2017)
Pabarskaite, Z.: Implementing advanced cleaning and end-user interpretability technologies in web log mining. In: Proceedings of the 24th International Conference on Information Technology Interfaces, ITI 2002, pp. 109–113. IEEE (2002)
Pani, S.K., Panigrahy, L., Sankar, V., Ratha, B.K., Mandal, A., Padhi, S.: Web usage mining: a survey on pattern extraction from web logs. Int. J. Instrum. Control Autom. 1(1), 15–23 (2011)
Perkowitz, M., Etzioni, O.: Adaptive web sites: automatically synthesizing web pages. In: AAAI/IAAI, pp. 727–732 (1998)
Piatetsky-Shapiro, G., Fayyad, U., Smith, P.: From data mining to knowledge discovery: an overview. Adv. Knowl. Discov. Data Min. 1, 35 (1996)
Spiliopoulou, M., Faulstich, L.C.: WUM: a web utilization miner. In: International Workshop on the Web and Databases, Valencia, Spain. Citeseer (1998)
Spiliopoulou, M., Faulstich, L.C., Winkler, K.: A data miner analyzing the navigational behaviour of web users. In: Proceedings of the Workshop on Machine Learning in User Modelling of the ACAI 1999, Greece, July 1999
Suneetha, K., Krishnamoorthi, R.: Identifying user behavior by analyzing web server access log file. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(4), 327–332 (2009)
Wahab, M.H.A., Mohd, M.N.H., Hanafi, H.F., Mohsin, M.F.M.: Data pre-processing on web server logs for generalized association rules mining algorithm. World Acad. Sci. Eng. Technol. 48, 2008 (2008)
Yun, L., Xun, W., Huamao, G.: A hybrid information filtering algorithm based on distributed web log mining. In: Third International Conference on Convergence and Hybrid Information Technology, ICCIT 2008, vol. 1, pp. 1086–1091. IEEE (2008)
Acknowledgements
This research was conducted within the project of Studying collaborative caching algorithms in content delivery network sponsored by TIS (IT Holding Group).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, MT., Diep, TD., Hoang Vinh, T., Nakajima, T., Thoai, N. (2018). Analyzing and Visualizing Web Server Access Log File. In: Dang, T., Küng, J., Wagner, R., Thoai, N., Takizawa, M. (eds) Future Data and Security Engineering. FDSE 2018. Lecture Notes in Computer Science(), vol 11251. Springer, Cham. https://doi.org/10.1007/978-3-030-03192-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-03192-3_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03191-6
Online ISBN: 978-3-030-03192-3
eBook Packages: Computer ScienceComputer Science (R0)