Advertisement

Traffic Analytics for Linked Data Publishers

  • Luca Costabello
  • Pierre-Yves Vandenbussche
  • Gofran Shukair
  • Corine Deliot
  • Neil Wilson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10249)

Abstract

We present a traffic analytics platform for servers that publish Linked Data. To the best of our knowledge, this is the first system that mines access logs of registered Linked Data servers to extract traffic insights on daily basis and without human intervention. The framework extracts Linked Data-specific traffic metrics from log records of HTTP lookups and SPARQL queries, and provides insights not available in traditional web analytics tools. Among all, we detect visitor sessions with a variant of hierarchical agglomerative clustering. We also identify workload peaks of SPARQL endpoints by detecting heavy and light SPARQL queries with supervised learning. The platform has been tested on 13 months of access logs of the British National Bibliography RDF dataset.

Keywords

Linked data Traffic analytics Data publication SPARQL 

Notes

Acknowledgements

This work has been supported by the TOMOE project funded by Fujitsu Laboratories Limited in collaboration with Insight Centre at NUI Galway.

References

  1. 1.
    Arlitt, M.: Characterizing web user sessions. ACM SIGMETRICS Perform. Eval. Rev. 28(2), 50–63 (2000)CrossRefGoogle Scholar
  2. 2.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: Ready for action? In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41338-4_18CrossRefGoogle Scholar
  3. 3.
    Costabello, L., Vandenbussche, P., Shukair, G., Deliot, C., Wilson, N.: Access logs don’t lie: Towards traffic analytics for linked data publishers. In: Proceedings of ISWC Posters & Demos Track (2016)Google Scholar
  4. 4.
    Demartini, G., Enchev, I., Wylot, M., Gapany, J., Cudré-Mauroux, P.: BowlognaBench—Benchmarking RDF analytics. In: Aberer, K., Damiani, E., Dillon, T. (eds.) SIMPDA 2011. LNBIP, vol. 116, pp. 82–102. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34044-4_5CrossRefGoogle Scholar
  5. 5.
    Dividino, R., Gröner, G.: Which of the following SPARQL queries are similar? why? In: Proceedings of LD4IE Workshop (2013)Google Scholar
  6. 6.
    Fasel, D., Zumstein, D.: A fuzzy data warehouse approach for web analytics. In: Lytras, M.D., Damiani, E., Carroll, J.M., Tennyson, R.D., Avison, D., Naeve, A., Dale, A., Lefrere, P., Tan, F., Sipior, J., Vossen, G. (eds.) WSKS 2009. LNCS (LNAI), vol. 5736, pp. 276–285. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04754-1_29CrossRefGoogle Scholar
  7. 7.
    Gallego, M.A., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. In: Proceedings of USEWOD (2011)Google Scholar
  8. 8.
    Halfaker, A., Keyes, O., Kluver, D., Thebault-Spieker, J., Nguyen, T., Shores, K., Uduwage, A., Warncke-Wang, M.: User session identification based on strong regularities in inter-activity time. In: Proceedings of WWW, pp. 410–418 (2015)Google Scholar
  9. 9.
    Hasan, R., Gandon, F.: A machine learning approach to SPARQL query performance prediction. In: Proceedings of WI, vol. 1, pp. 266–273. IEEE (2014)Google Scholar
  10. 10.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool, Palo Alto (2011)Google Scholar
  11. 11.
    Luczak-Roesch, M., Berendt, B., Hollink, L.: USEWOD 2015 Research Dataset (2015). http://dx.doi.org/10.5258/SOTON/379407
  12. 12.
    Mehrzadi, D., Feitelson, D.G.: On extracting session data from activity logs. In: Proceedings of ISS, p. 3. ACM (2012)Google Scholar
  13. 13.
    Möller, K., Hausenblas, K., Cyganiak, R., Handschuh, S.: Learning from linked open data usage: Patterns & metrics. In: Proceedings of Web Science (2010)Google Scholar
  14. 14.
    Murray, G.C., Lin, J., Chowdhury, A.: Identification of user sessions with hierarchical agglomerative clustering. In: ASIS&T, vol. 43(1), 1–9 (2006)CrossRefGoogle Scholar
  15. 15.
    Pallis, G., Angelis, L., Vakali, A.: Model-based cluster analysis for web users sessions. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 219–227. Springer, Heidelberg (2005). doi: 10.1007/11425274_23CrossRefGoogle Scholar
  16. 16.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006). doi: 10.1007/11926078_3CrossRefGoogle Scholar
  17. 17.
    Petridou, S.G., Koutsonikola, V.A., Vakali, A.I., Papadimitriou, G.I.: Time-aware web users’ clustering. IEEE Trans. Knowl. Data Eng. 20(5), 653–667 (2008)CrossRefGoogle Scholar
  18. 18.
    Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: Proceedings of SWIM, p. 7. ACM (2011)Google Scholar
  19. 19.
    Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: ICDT, pp. 4–33. ACM (2010)Google Scholar
  20. 20.
    Ye, C., Wilson, M.L., Rodden, T.: Develop, implement, and improve a web session detection model. In: Proceedings of IIiX, pp. 336–338. ACM (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Luca Costabello
    • 1
  • Pierre-Yves Vandenbussche
    • 1
  • Gofran Shukair
    • 1
  • Corine Deliot
    • 2
  • Neil Wilson
    • 2
  1. 1.Fujitsu Ireland Ltd.GalwayIreland
  2. 2.British LibraryLondonUK

Personalised recommendations