Abstract
This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campus’ usability to the actual learners’ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.
Similar content being viewed by others
Notes
References
Apache HTTP Server Project: http://httpd.apache.org/
Bentley, R., Appelt, W., Busbach, U., Hinrichs, E., Kerr, D., Sikkel, S., Trevor, J., Woetzel, G.: Basic support for cooperative work on the world wide web. Int. J. Hum.-Comput. Stud. 46(6), 827–846 (1997)
Bushey, R., Mauney, J.M., Deelman, T.: The development of behavior-based user models for a computer system. In: Proc. of the 7th Intl. Conf. on User Modeling (UM 99), pp. 109–118. Springer, Berlin (1999)
Caballé, S., Daradoumis, T., Xhafa, F., Conesa, J.: Enhancing knowledge management in online collaborative learning. Int. J. Softw. Eng. Knowl. Eng. 20(4), 485–497 (2010)
Caballé, S., Xhafa, F., Fernández, R., Daradoumis, Th.: Efficient enabling of real time user modeling in on-line campus. In: Proc. of the User Modeling 2007, pp. 365–369. Springer, Berlin (2007)
Caballé, S., Paniagua, C., Xhafa, F., Daradoumis, Th.: A grid-aware implementation for providing effective feedback to on-line learning groups. In: Proceedings of the Second International Workshop on Grid Computing and Its Application to Data Analysis (GADA 2005), pp. 274–283. Springer, Berlin (2005)
Carbó, J.M., Mor, E., Minguillón, J.: User navigational behavior in e-learning virtual environments. In: The IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), pp. 243–249 (2005)
Apache Server Log Files. http://httpd.apache.org/docs/1.3/logs.html
Foster, I., Kesselman, C.: The Grid: Blueprint for a Future Computing Infrastructure, pp. 15–52. Morgan Kaufmann, San Francisco (1998)
Gaudioso, E., Boticario, J.G.: Towards web-based adaptive learning communities. In: Proceedings of Artificial Intelligence in Education. IOS Press, Sydney, Australia (2003)
Apache Hadoop. http://hadoop.apache.org/
Apache Opensource Hadoop Map/Reduce framework. http://wiki.apache.org/hadoop/ProjectDescription
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Ciesielski, V., Lalani, A.: Data mining of web access logs from an academic web site. In: Abraham, A., Koppen, M., Franke, K. (eds.) Proceedings of the Third International Conference on Hybrid Intelligent Systems (HIS’03): Design and Application of Hybrid Intelligent Systems, December, pp. 1034–1043. IOS Press, Amsterdam (2003)
Bindu Madhuri, Ch., Anand Chandulal, J., Ramya, K., Phanidra, M.: Analysis of Users’ Web Navigation Behavior using GRPA with Variable Length Markov Chains. International Journal of Data Mining & Knowledge Management Process (IJDKP) 1(2) (2011)
Paniagua, C., Xhafa, F., Caballé, S., Daradoumis, T.: A grid prototype implementation for real time processing of group activity log data in collaborative applications. In: Proceedings of the 2005 PDPTA’05, Las Vegas, USA (2005)
Park, S., Suresh, N.C., Jeong, B.K.: Sequence-based clustering for web usage mining: a new experimental framework and ANN-enhanced K-means algorithm. Data Knowl. Eng. 65(3), 512–543 (2008)
PlanetLab. http://www.planet-lab.org/
Open University of Catalonia. http://www.uoc.edu
Weka 3: Data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Xhafa, F., Caballé, S., Daradoumis, Th., Zhou, N.: A grid-based approach for processing group activity log files. In: Proceedings of the First International Workshop on Grid Computing and Its Application to Data Analysis (GADA 2004), pp. 175–186 (2004)
Xhafa, F., Caballé, S., Barolli, L., Molina, A., Miho, R.: Using bi-clustering algorithm for analyzing online users activity in a virtual campus. In: Proceedings of the INCoS 2010 International Conference on Intelligent Networking and Collaborative Systems, pp. 214–221 (2010)
Acknowledgements
This work has been funded by the Open University of Catalonia (under the 2012 UOC-UPC agreements, n. 11-273-231 and n. 11-270-277) and supported by the European Commission under the Collaborative Project ALICE (VII Framework Programme, ICT-2009.4.2 TEL, n. 257639).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Caballé, S., Xhafa, F. Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus. Cluster Comput 16, 829–844 (2013). https://doi.org/10.1007/s10586-013-0256-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-013-0256-9