Skip to main content
Log in

Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campus’ usability to the actual learners’ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. eagle.lsi.upc.edu.

References

  1. Apache HTTP Server Project: http://httpd.apache.org/

  2. Bentley, R., Appelt, W., Busbach, U., Hinrichs, E., Kerr, D., Sikkel, S., Trevor, J., Woetzel, G.: Basic support for cooperative work on the world wide web. Int. J. Hum.-Comput. Stud. 46(6), 827–846 (1997)

    Article  Google Scholar 

  3. Bushey, R., Mauney, J.M., Deelman, T.: The development of behavior-based user models for a computer system. In: Proc. of the 7th Intl. Conf. on User Modeling (UM 99), pp. 109–118. Springer, Berlin (1999)

    Google Scholar 

  4. Caballé, S., Daradoumis, T., Xhafa, F., Conesa, J.: Enhancing knowledge management in online collaborative learning. Int. J. Softw. Eng. Knowl. Eng. 20(4), 485–497 (2010)

    Article  Google Scholar 

  5. Caballé, S., Xhafa, F., Fernández, R., Daradoumis, Th.: Efficient enabling of real time user modeling in on-line campus. In: Proc. of the User Modeling 2007, pp. 365–369. Springer, Berlin (2007)

    Chapter  Google Scholar 

  6. Caballé, S., Paniagua, C., Xhafa, F., Daradoumis, Th.: A grid-aware implementation for providing effective feedback to on-line learning groups. In: Proceedings of the Second International Workshop on Grid Computing and Its Application to Data Analysis (GADA 2005), pp. 274–283. Springer, Berlin (2005)

    Google Scholar 

  7. Carbó, J.M., Mor, E., Minguillón, J.: User navigational behavior in e-learning virtual environments. In: The IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), pp. 243–249 (2005)

    Google Scholar 

  8. Apache Server Log Files. http://httpd.apache.org/docs/1.3/logs.html

  9. Foster, I., Kesselman, C.: The Grid: Blueprint for a Future Computing Infrastructure, pp. 15–52. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  10. Gaudioso, E., Boticario, J.G.: Towards web-based adaptive learning communities. In: Proceedings of Artificial Intelligence in Education. IOS Press, Sydney, Australia (2003)

    Google Scholar 

  11. Apache Hadoop. http://hadoop.apache.org/

  12. Apache Opensource Hadoop Map/Reduce framework. http://wiki.apache.org/hadoop/ProjectDescription

  13. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  14. Ciesielski, V., Lalani, A.: Data mining of web access logs from an academic web site. In: Abraham, A., Koppen, M., Franke, K. (eds.) Proceedings of the Third International Conference on Hybrid Intelligent Systems (HIS’03): Design and Application of Hybrid Intelligent Systems, December, pp. 1034–1043. IOS Press, Amsterdam (2003)

    Google Scholar 

  15. Bindu Madhuri, Ch., Anand Chandulal, J., Ramya, K., Phanidra, M.: Analysis of Users’ Web Navigation Behavior using GRPA with Variable Length Markov Chains. International Journal of Data Mining & Knowledge Management Process (IJDKP) 1(2) (2011)

  16. Paniagua, C., Xhafa, F., Caballé, S., Daradoumis, T.: A grid prototype implementation for real time processing of group activity log data in collaborative applications. In: Proceedings of the 2005 PDPTA’05, Las Vegas, USA (2005)

    Google Scholar 

  17. Park, S., Suresh, N.C., Jeong, B.K.: Sequence-based clustering for web usage mining: a new experimental framework and ANN-enhanced K-means algorithm. Data Knowl. Eng. 65(3), 512–543 (2008)

    Article  Google Scholar 

  18. PlanetLab. http://www.planet-lab.org/

  19. Open University of Catalonia. http://www.uoc.edu

  20. Weka 3: Data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/

  21. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)

    Google Scholar 

  22. Xhafa, F., Caballé, S., Daradoumis, Th., Zhou, N.: A grid-based approach for processing group activity log files. In: Proceedings of the First International Workshop on Grid Computing and Its Application to Data Analysis (GADA 2004), pp. 175–186 (2004)

    Google Scholar 

  23. Xhafa, F., Caballé, S., Barolli, L., Molina, A., Miho, R.: Using bi-clustering algorithm for analyzing online users activity in a virtual campus. In: Proceedings of the INCoS 2010 International Conference on Intelligent Networking and Collaborative Systems, pp. 214–221 (2010)

    Chapter  Google Scholar 

Download references

Acknowledgements

This work has been funded by the Open University of Catalonia (under the 2012 UOC-UPC agreements, n. 11-273-231 and n. 11-270-277) and supported by the European Commission under the Collaborative Project ALICE (VII Framework Programme, ICT-2009.4.2 TEL, n. 257639).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santi Caballé.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caballé, S., Xhafa, F. Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus. Cluster Comput 16, 829–844 (2013). https://doi.org/10.1007/s10586-013-0256-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0256-9

Keywords

Navigation