Similarity measures for OLAP sessions

Aligon, Julien; Golfarelli, Matteo; Marcel, Patrick; Rizzi, Stefano; Turricchia, Elisa

doi:10.1007/s10115-013-0614-1

Similarity measures for OLAP sessions

Regular Paper
Published: 09 March 2013

Volume 39, pages 463–489, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Julien Aligon¹,
Matteo Golfarelli²,
Patrick Marcel¹,
Stefano Rizzi² &
…
Elisa Turricchia²

946 Accesses
37 Citations
3 Altmetric
Explore all metrics

Abstract

OLAP queries are not normally formulated in isolation, but in the form of sequences called OLAP sessions. Recognizing that two OLAP sessions are similar would be useful for different applications, such as query recommendation and personalization; however, the problem of measuring OLAP session similarity has not been studied so far. In this paper, we aim at filling this gap. First, we propose a set of similarity criteria derived from a user study conducted with a set of OLAP practitioners and researchers. Then, we propose a function for estimating the similarity between OLAP queries based on three components: the query group-by set, its selection predicate, and the measures required in output. To assess the similarity of OLAP sessions, we investigate the feasibility of extending four popular methods for measuring similarity, namely the Levenshtein distance, the Dice coefficient, the tf–idf weight, and the Smith–Waterman algorithm. Finally, we experimentally compare these four extensions to show that the Smith–Waterman extension is the one that best captures the users’ criteria for session similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UQSCM-RFD: A query–knowledge interfacing approach for diversified query recommendation in semantic search based on river flow dynamics and dynamic user interaction

Article 21 August 2021

Do User (Browse and Click) Sessions Relate to Their Questions in a Domain-Specific Collection?

The Cluster Hypothesis in Information Retrieval

Notes

Available at http://www.julien.aligon.fr/recherche/similarityform.aspx.
http://cs.ulb.ac.be/conferences/ebiss2011/.
Note that, while substrings are consecutive parts of a string, subsequences need not be.
While this enables a simpler formalization for group-by sets (see Definition 4.2), it does not significantly impact on the overall approach. Indeed, partially-ordered hierarchies could be easily solved with by extending Definition 5.4 to measure the distance between two group-by sets on the multidimensional lattice as suggested by Golfarelli [16].
In a relational implementation, a multidimensional schema can be translated into either a star or a snowflake schema. While the specific joins required in these two cases to formulate the same query are different, a user is completely unaware of this difference because OLAP tools completely hide the underlying SQL and logical schemata to let users reason on the multidimensional cube abstraction.
In the formula, the three rows of the \(min\) argument deal with deletions, insertions, and substitutions, respectively.

References

Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Reading
MATH Google Scholar
Agrawal R, Rantzau R, Terzi E (2006) Context-sensitive ranking. In: Proceedings ACM SIGMOD international conference on management of data. Chicago, IL, pp 383–394
Akbarnejad J, Chatzopoulou G, Eirinaki M, Koshy S, Mittal S, On D, Polyzotis N, Varman JSV (2010) SQL QueRIE recommendations. PVLDB 3(2):1597–1600
Google Scholar
Aligon J, Golfarelli M, Marcel P, Rizzi S, Turricchia E (2011) Mining preferences from OLAP query logs for proactive personalization. In: Proceedings ADBIS. Vienna, Austria, pp 84–97
Aouiche K, Jouve P-E, Darmont J (2006) Clustering-based materialized view selection in data warehouses. In: Proceedings ADBIS. Thessaloniki, Greece, pp 81–95
Baikousi E, Rogkakos G, Vassiliadis P (2011) Similarity measures for multidimensional data. In: Proceedings ICDE. Hannover, Germany, pp 171–182
Brown PF, Pietra VJD, de Souza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
Google Scholar
Bustos B, Skopal T (2011) Non-metric similarity search problems in very large collections. In: Proceedings ICDE. Hannover, Germany, pp 1362–1365
Chatzopoulou G, Eirinaki M, Koshy S, Mittal S, Polyzotis N, Varman JSV (2011) The QueRIE system for personalized query recommendations. IEEE Data Eng Bull 34(2):55–60
Google Scholar
Chatzopoulou G, Eirinaki M, Polyzotis N (2009) Query recommendations for interactive database exploration. In: Proceedings SSDBM. New Orleans, LA, pp 3–18
Cohen WW, Ravikumar PD, Fienberg SE (2003) A comparison of string distance metrics for name-matching tasks. In: Proceedings IJCAI-03 workshop on information integration on the web. Acapulco, Mexico, pp 73–78
Drosou M, Pitoura E (2011) ReDRIVE: result-driven database exploration through recommendations. In: Proceedings CIKM. Glasgow, UK, pp 1547–1552
Garcia-Molina H, Ullman JD, Widom JD (2008) Database systems: the complete book, 2nd edn. Prentice Hall, Englewood Cliffs
Google Scholar
Ghosh A, Parikh J, Sengar VS, Haritsa JR (2002) Plan selection based on query clustering. In: Proceedings VLDB. Hong Kong, China, pp 179–190
Giacometti A, Marcel P, Negre E (2009) Recommending multidimensional queries. In: ‘Proceedings DaWaK. Linz, Austria, pp 453–466
Golfarelli M (2003) Handling large workloads by profiling and clustering. In: Proceedings DaWaK. Czech Republic, Prague, pp 212–223
Golfarelli M, Rizzi S, Biondi P (2011) myOLAP: an approach to express and evaluate OLAP preferences. IEEE TKDE 23(7):1050–1064
Google Scholar
Grossman D, Frieder O (2004) Information retrieval: algorithms and heuristics. Springer, Berlin
Book Google Scholar
Gupta A, Mumick I (1999) Materialized views: techniques, implementations, and applications. MIT Press, Cambridge
Google Scholar
Khoussainova N, Kwon Y, Balazinska M, Suciu D (2010) SnipSuggest: context-aware autocompletion for SQL. PVLDB 4(1):22–33
Google Scholar
Khoussainova N, Kwon, Y, Liao W-T, Balazinska M, Gatterbauer W, Suciu D (2011) Session-based browsing for more effective query reuse. In: Proceedings SSDBM. Portland, OR, pp. 583–585
Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5):589–595
Article Google Scholar
Minnesota Population Center (2008) Integrated public use microdata series. http://www.ipums.org
Monge AE, Elkan C (1997) An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proceedings workshop on research issues on data mining and knowledge discovery
Moreau E, Yvon F, Cappé O (2008) Robust similarity measures for named entities matching. In: Proceedings international conference on computational linguistics. Manchester, UK, pp 593–600
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
Article Google Scholar
Ögüdücü SG (2010) Web page recommendation models: theory and algorithms. In: Synthesis lectures on data management. Morgan & Claypool Publishers
Ristad ES, Yianilos PN (1998) Learning string-edit distance. IEEE Trans Pattern Anal Mach Intell 20(5):522–532
Article Google Scholar
Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: Proceedings DaWaK. London, UK, pp 224–233
Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Article Google Scholar
Stefanidis K, Drosou M, Pitoura E (2009) “You May Also Like” results in relational databases. In: Proceedings international workshop on personalized access. Profile management and context awareness: Databases. Lyon, France
Wagner R, Fischer M (1974) The string-to-string correction problem. J ACM 21(1):168–173
Article MATH MathSciNet Google Scholar
Yang X, Procopiuc CM, Srivastava D (2009) Recommending join queries via query log analysis. In: Proceedings ICDE. Shanghai, China, pp 964–975
Yao Q, An A, Huang X (2005) Finding and analyzing database user sessions. In: Proceedings DASFAA. Beijing, China, pp 851–862

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique, Université François Rabelais, Tours, France
Julien Aligon & Patrick Marcel
DISI, University of Bologna, Viale Risorgimento 2, 40136, Bologna, Italy
Matteo Golfarelli, Stefano Rizzi & Elisa Turricchia

Authors

Julien Aligon
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Golfarelli
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Marcel
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Rizzi
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Turricchia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Rizzi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aligon, J., Golfarelli, M., Marcel, P. et al. Similarity measures for OLAP sessions. Knowl Inf Syst 39, 463–489 (2014). https://doi.org/10.1007/s10115-013-0614-1

Download citation

Received: 03 July 2012
Revised: 22 January 2013
Accepted: 17 February 2013
Published: 09 March 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10115-013-0614-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity measures for OLAP sessions

Abstract

Access this article

Similar content being viewed by others

UQSCM-RFD: A query–knowledge interfacing approach for diversified query recommendation in semantic search based on river flow dynamics and dynamic user interaction

Do User (Browse and Click) Sessions Relate to Their Questions in a Domain-Specific Collection?

The Cluster Hypothesis in Information Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Similarity measures for OLAP sessions

Abstract

Access this article

Similar content being viewed by others

UQSCM-RFD: A query–knowledge interfacing approach for diversified query recommendation in semantic search based on river flow dynamics and dynamic user interaction

Do User (Browse and Click) Sessions Relate to Their Questions in a Domain-Specific Collection?

The Cluster Hypothesis in Information Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation