Advertisement

Efficient Indexing and Representation of Web Access Logs

  • Francisco Claude
  • Roberto Konow
  • Gonzalo Navarro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8799)

Abstract

We present a space-efficient data structure, based on the Burrows-Wheeler Transform, especially designed to handle web sequence logs, which are needed by web usage mining processes. Our index is able to process a set of operations efficiently, while at the same time maintains the original information in compressed form. Results show that web access logs can be represented using 0.85 to 1.03 times their original (plain) size, while executing most of the operations within a few tens of microseconds.

Keywords

Priority Queue Space Usage Wavelet Tree Distinct User Select Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice. In: Proc. 11th ALENEX, pp. 84–97 (2010)Google Scholar
  2. 2.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. rep., Digital Equipment Corporation (1994)Google Scholar
  3. 3.
    Clark, D.: Compact Pat Trees. Ph.D. thesis, Univ. of Waterloo, Canada (1996)Google Scholar
  4. 4.
    Culpepper, J.S., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k ranked document search in general text databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Domènech, J., Gil, J.A., Sahuquillo, J., Pont, A.: Web prefetching performance metrics: A survey. Perform. Eval. 63(9), 988–1004 (2006)CrossRefGoogle Scholar
  6. 6.
    Dongshan, X., Junyi, S.: A new markov model for web access prediction. Computing in Science and Eng. 4(6), 34–39 (2002)CrossRefGoogle Scholar
  7. 7.
    Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comp. 40(2), 465–492 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Gagie, T., Navarro, G., Puglisi, S.: New algorithms on wavelet trees and applications to information retrieval. Theor. Comp. Sci. 426-427, 25–41 (2012)Google Scholar
  10. 10.
    González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Proc. Posters 4th WEA, pp. 27–38 (2005)Google Scholar
  11. 11.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)Google Scholar
  12. 12.
    Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining Knowl. Disc. 15(1), 55–86 (2007)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hon, W.K., Shah, R., Vitter, J.: Space-efficient framework for top-k string retrieval problems. In: Proc. 50th FOCS, pp. 713–722 (2009)Google Scholar
  14. 14.
    Hussain, T., Asghar, S., Masood, N.: Web usage mining: A survey on preprocessing of web log file. In: Proc. ICIET, pp. 1–6 (2010)Google Scholar
  15. 15.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Mobasher, B.: Data mining for web personalization. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 90–135. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  18. 18.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proc. 13th SODA, pp. 657–666 (2002)Google Scholar
  19. 19.
    Nadi, S., Saraee, M., Davarpanah-Jazi, M.: A fuzzy recommender system for dynamic prediction of user’s behavior. In: Proc. ICITST, pp. 1–5 (2010)Google Scholar
  20. 20.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1) (2007)Google Scholar
  21. 21.
    Navarro, G., Valenzuela, D.: Space-efficient top-k document retrieval. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 307–319. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  22. 22.
    Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 396–407. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  23. 23.
    Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Alg. 3(4), art. 43 (2007)Google Scholar
  24. 24.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Alg. 48(2), 294–313 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Sadakane, K.: Compressed suffix trees with full functionality. Theor. Comp. Sys. 41(4), 589–607 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discr. Alg. 5(1), 12–22 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: Proc. 21st SODA, pp. 134–149 (2010)Google Scholar
  28. 28.
    Su, Z., Yang, Q., Lu, Y., Zhang, H.: Whatnext: A prediction system for web requests using n-gram sequence models. In: Proc. 1st WISE, pp. 214–224 (2000)Google Scholar
  29. 29.
    Sumathi, C., Valli, R.P., Santhanam, T.: Automatic recommendation of web pages in web usage mining. Intl. J. Comp. Sci. Eng. 2, 3046–3052 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Francisco Claude
    • 1
  • Roberto Konow
    • 1
    • 2
  • Gonzalo Navarro
    • 2
  1. 1.Escuela de Informática y TelecomunicacionesUniversidad Diego PortalesChile
  2. 2.Department of Computer ScienceUniversity of ChileChile

Personalised recommendations