Data Summarization Model for User Action Log Files

  • Eleonora Gentili
  • Alfredo Milani
  • Valentina Poggioni
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7335)

Abstract

During last years we have seen an impressive growth and diffusion of applications shared and used by a huge amount of users around the world, like for example social networks, web portals or elearning platforms. Such systems produce in general a large amount of data, normally stored in its raw format in log file systems and databases. To prevent an unmanageable growing of the necessary space to store data and the breakdown of data usability, such data can be condensed and summarized to improve reporting performance and reduce the system load. This data summarization reduces the amount of space that is required to store software data but produces, as a side effect, a decrease of their informative capability due to an information loss. In this work the problem of summarizing data obtained by the log systems of applications with a lot of users is studied. In particular a model to represent these raw data as temporal events collected in time sequences is proposed, methods to reduce the data size, collapsing the descriptions of more events in a unique descriptor or in a smaller set of descriptors, are provided and the optimal summarization problem is posed.

Keywords

Time Sequence Unique Descriptor Event Descriptor Merging Operator Abstraction Operator 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hadoop: an Open-Source MapReduce computing platform, http://hadoop.apache.org
  2. 2.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 3–14 (1995)Google Scholar
  3. 3.
    Allen, J.F.: An interval-based representation of temporal knowledge. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, vol. 1, pp. 221–226 (1981)Google Scholar
  4. 4.
    Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 832–843 (1983)MATHCrossRefGoogle Scholar
  5. 5.
    Chandola, V., Kumar, V.: Summarization–compressing data into an informative representation. Knowledge and Information Systems 12(3), 355–378 (2007)CrossRefGoogle Scholar
  6. 6.
    Costantini, A., Tasso, S., Gervasi, O.: It Visualization and Web Services for Studying Molecular Properties. In: Computational Science and Applications, pp. 222–228 (2009) ISBN 978-0-7695-3701-6Google Scholar
  7. 7.
    Jiang, Y., Perng, C.S., Li, T.: Natural event summarization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 765–774. ACM (2011)Google Scholar
  8. 8.
    Kiernan, J., Terzi, E.: Constructing comprehensive summaries of large event sequences. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 417–425. ACM (2008)Google Scholar
  9. 9.
    Kiernan, J., Terzi, E.: Constructing comprehensive summaries of large event sequences. ACM Transactions on Knowledge Discovery from Data (TKDD) 3(4), 21 (2009)Google Scholar
  10. 10.
    Kiernan, J., Terzi, E.: EventSummarizer: A tool for summarizing large event sequences. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 1136–1139. ACM (2009)Google Scholar
  11. 11.
    Pallottelli, S., Tasso, S., Pannacci, N., Costantini, A., Lago, N.F.: Distributed and Collaborative Learning Objects Repositories on Grid Networks. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds.) ICCSA 2010. LNCS, vol. 6019, pp. 29–40. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Peng, W., Perng, C., Li, T., Wang, H.: Event summarization for system management. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1028–1032 (2007)Google Scholar
  13. 13.
    Pham, Q.K., Raschia, G., Mouaddib, N., Saint-Paul, R., Benatallah, B.: Time sequence summarization to scale up chronology-dependent applications. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 1137–1146 (2009)Google Scholar
  14. 14.
    Povinelli, R.J.: Identifying temporal patterns for characterization and prediction of financial time series events. In: Temporal Spatial and SpatioTemporal Data Mining, pp. 46–61 (2001)Google Scholar
  15. 15.
    Saint-Paul, R., Raschia, G., Mouaddib, N.: General purpose database summarization. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 733–744. VLDB Endowment (2005)Google Scholar
  16. 16.
    Tang, L., Li, T., Perng, C.S.: LogSig: Generating system events from raw textual logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 785–794. ACM (2011)Google Scholar
  17. 17.
    Tasso, S., Pallottelli, S., Bastianini, R., Lagana, A.: Federation of Distributed and Collaborative Repositories and Its Application on Science Learning Objects. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part III. LNCS, vol. 6784, pp. 466–478. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  18. 18.
    Wang, J., Karypis, G.: On efficiently summarizing categorical databases. Knowledge and Information Systems 9(1), 19–37 (2006)CrossRefGoogle Scholar
  19. 19.
    Wang, P., Wang, H., Liu, M., Wang, W.: An algorithmic approach to event summarization. In: Proceedings of the 2010 International Conference on Management of Data, pp. 183–194. ACM (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Eleonora Gentili
    • 1
  • Alfredo Milani
    • 1
  • Valentina Poggioni
    • 1
  1. 1.Dipartimento di Matematica e InformaticaUniversità degli Studi di PerugiaPerugiaItaly

Personalised recommendations