“Tell Me More”: Finding Related Items from User Provided Feedback

  • Jeroen De Knijf
  • Anthony Liekens
  • Bart Goethals
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6926)

Abstract

The results returned by a search, datamining or database engine often contains an overload of potentially interesting information. A daunting and challenging problem for a user is to pick out the useful information. In this paper we propose an interactive framework to efficiently explore and (re)rank the objects retrieved by such an engine, according to feedback provided on part of the initially retrieved objects. In particular, given a set of objects, a similarity measure applicable to the objects and an initial set of objects that are of interest to the user, our algorithm computes the k most similar objects. This problem, previously coined as ’cluster on demand’ [10], is solved by transforming the data into a weighted graph. On this weighted graph we compute a relevance score between the initial set of nodes and the remaining nodes based upon random walks with restart in graphs. We apply our algorithm “Tell Me More” (TMM) on text, numerical and zero/one data. The results show that TMM for almost every experiment significantly outperforms a k-nearest neighbor approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast discovery of association rules. In: ADMA, pp. 307–328 (1996)Google Scholar
  2. 2.
    Cilibrasi, R., Vitányi, P., Wolf, R.: Algorithmic clustering of music. In: 4th International Conference on WEB Delivering of Music, pp. 110–117 (2004)Google Scholar
  3. 3.
    Coenen, F.: The lucs-kdd discretised/normalised arm and carm data libraryGoogle Scholar
  4. 4.
    Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246 (2007)Google Scholar
  5. 5.
    Dean, J., Henzinger, M.: Finding related pages in the world wide web. Computer Networks 31(11-16), 1467–1479 (1999)CrossRefGoogle Scholar
  6. 6.
    Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40(1), 64–69 (2006)CrossRefGoogle Scholar
  7. 7.
    Denoyer, L., Gallinari, P.: Report on the XML mining track at inex 2005 and inex 2006: categorization and clustering of XML documents. SIGIR Forum 41(1), 79–90 (2007)CrossRefGoogle Scholar
  8. 8.
    Faloutsos, C., Megalooikonomou, V.: On data mining, compression, and Kolmogorov complexity. Data Mining and Knowledge Discovery 15(1), 3–20 (2007)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Fuxman, A., Tsaparas, P., Achan, K., Agrawal, R.: Using the wisdom of the crowds for keyword generation. In: WWW (2008)Google Scholar
  10. 10.
    Ghahramani, Z., Heller, K.: Bayesian sets. In: Advances in Neural Information Processing Systems (2005)Google Scholar
  11. 11.
    Haveliwala, T.: Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)CrossRefGoogle Scholar
  12. 12.
    Haveliwala, T., Gionis, A., Klein, D., Indyk, P.: Evaluating strategies for similarity search on the web. In: WWW, pp. 432–442 (2002)Google Scholar
  13. 13.
    De Knijf, J.: Mining tree patterns with almost smallest supertrees. In: SIAM International Conference on Data Mining. SIAM, Philadelphia (2008)Google Scholar
  14. 14.
    Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Transactions on Information Theory 50(12), 3250–3264 (2004)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  16. 16.
    Onuma, K., Tong, H., Faloutsos, C.: Tangent: a novel, ’surprise me’, recommendation algorithm. In: KDD, pp. 657–666 (2009)Google Scholar
  17. 17.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  18. 18.
    Pan, J., Yang, H., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: KDD, pp. 653–658 (2004)Google Scholar
  19. 19.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)MATHGoogle Scholar
  20. 20.
    Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SIAM International Conference on Data Mining (2006)Google Scholar
  21. 21.
    Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly detection in bipartite graphs. In: IEE Intl. Conf. on Data Mining, pp. 418–425 (2005)Google Scholar
  22. 22.
    Tong, H., Faloutsos, C.: Center-piece subgraphs: problem definition and fast solutions. In: KDD, pp. 404–413 (2006)Google Scholar
  23. 23.
    Tong, H., Faloutsos, C., Pan, J.: Random walk with restart: fast solutions and applications. Knowl. Inf. Syst. 14(3), 327–346 (2008)CrossRefMATHGoogle Scholar
  24. 24.
    Voorhees, E.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: SIGIR, pp. 315–323 (1998)Google Scholar
  25. 25.
    Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Xin, D., Han, J., Yan, X., Cheng, H.: On compressing frequent patterns. Data & Knowledge Engeneering 60(1), 5–29 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jeroen De Knijf
    • 1
  • Anthony Liekens
    • 2
  • Bart Goethals
    • 1
  1. 1.Department of Mathematics and Computer ScienceAntwerp UniversityBelgium
  2. 2.VIB Department of Molecular GeneticsAntwerp UniversityBelgium

Personalised recommendations