Skip to main content

Efficient Indexing and Representation of Web Access Logs

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8799))

Included in the following conference series:

  • 613 Accesses

Abstract

We present a space-efficient data structure, based on the Burrows-Wheeler Transform, especially designed to handle web sequence logs, which are needed by web usage mining processes. Our index is able to process a set of operations efficiently, while at the same time maintains the original information in compressed form. Results show that web access logs can be represented using 0.85 to 1.03 times their original (plain) size, while executing most of the operations within a few tens of microseconds.

This work was partially supported by the Conicyt PhD Scholarship, by Fondecyt Iniciación Grant 11130104, and by Millennium Nucleus Information and Coordination in Networks ICM/FIC P10-024F.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice. In: Proc. 11th ALENEX, pp. 84–97 (2010)

    Google Scholar 

  2. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. rep., Digital Equipment Corporation (1994)

    Google Scholar 

  3. Clark, D.: Compact Pat Trees. Ph.D. thesis, Univ. of Waterloo, Canada (1996)

    Google Scholar 

  4. Culpepper, J.S., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k ranked document search in general text databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Domènech, J., Gil, J.A., Sahuquillo, J., Pont, A.: Web prefetching performance metrics: A survey. Perform. Eval. 63(9), 988–1004 (2006)

    Article  Google Scholar 

  6. Dongshan, X., Junyi, S.: A new markov model for web access prediction. Computing in Science and Eng. 4(6), 34–39 (2002)

    Article  Google Scholar 

  7. Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  8. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comp. 40(2), 465–492 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Gagie, T., Navarro, G., Puglisi, S.: New algorithms on wavelet trees and applications to information retrieval. Theor. Comp. Sci. 426-427, 25–41 (2012)

    Google Scholar 

  10. González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Proc. Posters 4th WEA, pp. 27–38 (2005)

    Google Scholar 

  11. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)

    Google Scholar 

  12. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining Knowl. Disc. 15(1), 55–86 (2007)

    Article  MathSciNet  Google Scholar 

  13. Hon, W.K., Shah, R., Vitter, J.: Space-efficient framework for top-k string retrieval problems. In: Proc. 50th FOCS, pp. 713–722 (2009)

    Google Scholar 

  14. Hussain, T., Asghar, S., Masood, N.: Web usage mining: A survey on preprocessing of web log file. In: Proc. ICIET, pp. 1–6 (2010)

    Google Scholar 

  15. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  16. Mobasher, B.: Data mining for web personalization. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 90–135. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  18. Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proc. 13th SODA, pp. 657–666 (2002)

    Google Scholar 

  19. Nadi, S., Saraee, M., Davarpanah-Jazi, M.: A fuzzy recommender system for dynamic prediction of user’s behavior. In: Proc. ICITST, pp. 1–5 (2010)

    Google Scholar 

  20. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1) (2007)

    Google Scholar 

  21. Navarro, G., Valenzuela, D.: Space-efficient top-k document retrieval. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 307–319. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  22. Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 396–407. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  23. Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Alg. 3(4), art. 43 (2007)

    Google Scholar 

  24. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Alg. 48(2), 294–313 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  25. Sadakane, K.: Compressed suffix trees with full functionality. Theor. Comp. Sys. 41(4), 589–607 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  26. Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discr. Alg. 5(1), 12–22 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  27. Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: Proc. 21st SODA, pp. 134–149 (2010)

    Google Scholar 

  28. Su, Z., Yang, Q., Lu, Y., Zhang, H.: Whatnext: A prediction system for web requests using n-gram sequence models. In: Proc. 1st WISE, pp. 214–224 (2000)

    Google Scholar 

  29. Sumathi, C., Valli, R.P., Santhanam, T.: Automatic recommendation of web pages in web usage mining. Intl. J. Comp. Sci. Eng. 2, 3046–3052 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Claude, F., Konow, R., Navarro, G. (2014). Efficient Indexing and Representation of Web Access Logs. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11918-2_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11917-5

  • Online ISBN: 978-3-319-11918-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics