Pay-as-You-Go Ranking of Schema Mappings Using Query Logs

  • Ruhaila Maskat
  • Norman W. Paton
  • Suzanne M. Embury
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7348)

Abstract

Data integration systems typically make use of mappings to capture the relationships between the data resources to be integrated and the integrated representations presented to users. Manual development and maintenance of such mappings is time consuming and thus costly. Pay-as-you-go approaches to data integration support automatic construction of initial mappings, which are generally of rather poor quality, for refinement in the light of user feedback. However, automatic approaches that produce these mappings typically lead to the generation of multiple, overlapping candidate mappings. To present the most relevant set of results to user queries, the mappings have to be ranked. We proposed a ranking technique that uses information from query logs to discriminate among candidate mappings. The technique is evaluated in terms of how quickly stable rankings can be produced, and to investigate how the rankings track query patterns that are skewed towards specific sources.

Keywords

Schema Mapping Ranking Implicit Feedback Dataspaces Pay-as-you-go Data Integration 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Evfimievski, A., Kiernan, J., Velu, R.: Auditing disclosure by relevance ranking. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 79–90. ACM (2007)Google Scholar
  2. 2.
    Agrawal, S., Chaudhuri, S.: DBXplorer: A system for keyword-based search over relational databases. In: Data Engineering, 2002, pp. 5–16 (2002)Google Scholar
  3. 3.
    Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 573–584. ACM, New York (2010)CrossRefGoogle Scholar
  4. 4.
    Bhalotia, G., Hulgeri, A., Nakhe, C.: Keyword searching and browsing in databases using BANKS. In: Data Engineering (2002)Google Scholar
  5. 5.
    Cao, H., Qi, Y., Selçuk Candan, K., Sapino, M.L.: Feedback-driven result ranking and query refinement for exploring semi-structured data collections. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 3–14. ACM, New York (2010)CrossRefGoogle Scholar
  6. 6.
    Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst. 31(3), 1134–1168 (2006)CrossRefGoogle Scholar
  7. 7.
    Demeter, J., et al.: The stanford microarray database: implementation of new analysis tools and open source release of software. Nucleic Acids Research 35(Database-Issue), 766–770 (2007)CrossRefGoogle Scholar
  8. 8.
    Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Research 30(1), 207–210 (2002)CrossRefGoogle Scholar
  9. 9.
    Elmeleegy, H., Elmagarmid, A., Lee, J.: Leveraging query logs for schema mapping generation in U-MAP. In: Proceedings of the 2011 International Conference on Management of Data, SIGMOD 2011, pp. 121–132. ACM, New York (2011)Google Scholar
  10. 10.
    Elmeleegy, H., Ouzzani, M., Elmagarmid, A.: Usage-Based Schema Matching. In: International Conference on Data Engineering, pp. 20–29 (2008)Google Scholar
  11. 11.
    Engel, S.R., et al.: Saccharomyces genome database provides mutant phenotype data. Nucleic Acids Research 38(Database-Issue), 433–436 (2010)CrossRefGoogle Scholar
  12. 12.
    Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)CrossRefGoogle Scholar
  13. 13.
    Gospodnetic, O., Hatcher, E.: Lucene in Action (In Action series). Manning Publications (December 2004)Google Scholar
  14. 14.
    Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 2009. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD Conference, p. 607 (2001)Google Scholar
  16. 16.
    Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 850–861. VLDB Endowment (2003)Google Scholar
  17. 17.
    Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 670–681. VLDB Endowment (2002)Google Scholar
  18. 18.
    Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 847–860. ACM, New York (2008)CrossRefGoogle Scholar
  19. 19.
    Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 154–161. ACM, New York (2005)CrossRefGoogle Scholar
  20. 20.
    Kelly, D., Belkin, N.J.: Display time as implicit feedback: understanding task effects. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 377–384. ACM (2004)Google Scholar
  21. 21.
    Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2), 18–28 (2003)CrossRefGoogle Scholar
  22. 22.
    Lange, M., Spies, K., Bargsten, J., Haberhauer, G., Klapperstück, M., Leps, M., Weinel, C., Wünschiers, R., Weissbach, M., Stein, J., Scholz, U.: The LAILAPS search engine: relevance ranking in life science databases. J. Integr. Bioinform. 7(2), 110 (2010)Google Scholar
  23. 23.
    Lu, Z., Kim, W., John Wilbur, W.: Evaluating relevance ranking strategies for MEDLINE retrieval. Journal of the American Medical Informatics Association: JAMIA 16(1), 32–36 (2009)CrossRefGoogle Scholar
  24. 24.
    Madhavan, J., Jeffery, S.F., Cohen, S., Dong, X., Ko, D., Yu, C., Halevy, A.: Web-scale data integration: You can only afford to pay as you go. In: Proceedings of CIDR, pp. 342–350 (2007)Google Scholar
  25. 25.
    Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)MATHCrossRefGoogle Scholar
  26. 26.
    Bullock, B.N., Jäschke, R., Hotho, A.: Tagging data as implicit feedback for learning-to-rank. In: Proceedings of the ACM WebSci 2011, Koblenz, Germany, June 14-17, pp. 1–4 (2011)Google Scholar
  27. 27.
    Oard, D.W., Kim, J.: Modeling information content using observable behavior. Science, 481–488 (2001)Google Scholar
  28. 28.
    Parkinson, H.E., et al.: Arrayexpress update - an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Research 39(Database-Issue), 1002–1004 (2011)CrossRefGoogle Scholar
  29. 29.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10, 334–350 (2001)MATHCrossRefGoogle Scholar
  30. 30.
    Salton, G., Waldstein, R.K.: Term relevance weights in on-line information retrieval. Information Processing and Management 14(1), 29–35 (1978)CrossRefGoogle Scholar
  31. 31.
    Schlieder, T., Meuss, H.: Querying and ranking xml documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)CrossRefGoogle Scholar
  32. 32.
    Sugiyama, K., Hatano, K., Yoshikawa, M., Uemura, S.: Refinement of tf-idf schemes for web pages using their hyperlinked neighboring pages. In: Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia, HYPERTEXT 2003, pp. 198–207. ACM, New York (2003)CrossRefGoogle Scholar
  33. 33.
    Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. Proc. VLDB Endow. 1(1), 785–796 (2008)Google Scholar
  34. 34.
    Xu, L., Embley, D.W.: A composite approach to automating direct and indirect schema mappings. Inf. Syst. 31(8), 697–732 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ruhaila Maskat
    • 1
  • Norman W. Paton
    • 1
  • Suzanne M. Embury
    • 1
  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUnited Kingdom

Personalised recommendations