Skip to main content

Pay-as-You-Go Ranking of Schema Mappings Using Query Logs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7348))

Abstract

Data integration systems typically make use of mappings to capture the relationships between the data resources to be integrated and the integrated representations presented to users. Manual development and maintenance of such mappings is time consuming and thus costly. Pay-as-you-go approaches to data integration support automatic construction of initial mappings, which are generally of rather poor quality, for refinement in the light of user feedback. However, automatic approaches that produce these mappings typically lead to the generation of multiple, overlapping candidate mappings. To present the most relevant set of results to user queries, the mappings have to be ranked. We proposed a ranking technique that uses information from query logs to discriminate among candidate mappings. The technique is evaluated in terms of how quickly stable rankings can be produced, and to investigate how the rankings track query patterns that are skewed towards specific sources.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Evfimievski, A., Kiernan, J., Velu, R.: Auditing disclosure by relevance ranking. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 79–90. ACM (2007)

    Google Scholar 

  2. Agrawal, S., Chaudhuri, S.: DBXplorer: A system for keyword-based search over relational databases. In: Data Engineering, 2002, pp. 5–16 (2002)

    Google Scholar 

  3. Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 573–584. ACM, New York (2010)

    Chapter  Google Scholar 

  4. Bhalotia, G., Hulgeri, A., Nakhe, C.: Keyword searching and browsing in databases using BANKS. In: Data Engineering (2002)

    Google Scholar 

  5. Cao, H., Qi, Y., Selçuk Candan, K., Sapino, M.L.: Feedback-driven result ranking and query refinement for exploring semi-structured data collections. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 3–14. ACM, New York (2010)

    Chapter  Google Scholar 

  6. Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst. 31(3), 1134–1168 (2006)

    Article  Google Scholar 

  7. Demeter, J., et al.: The stanford microarray database: implementation of new analysis tools and open source release of software. Nucleic Acids Research 35(Database-Issue), 766–770 (2007)

    Article  Google Scholar 

  8. Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Research 30(1), 207–210 (2002)

    Article  Google Scholar 

  9. Elmeleegy, H., Elmagarmid, A., Lee, J.: Leveraging query logs for schema mapping generation in U-MAP. In: Proceedings of the 2011 International Conference on Management of Data, SIGMOD 2011, pp. 121–132. ACM, New York (2011)

    Google Scholar 

  10. Elmeleegy, H., Ouzzani, M., Elmagarmid, A.: Usage-Based Schema Matching. In: International Conference on Data Engineering, pp. 20–29 (2008)

    Google Scholar 

  11. Engel, S.R., et al.: Saccharomyces genome database provides mutant phenotype data. Nucleic Acids Research 38(Database-Issue), 433–436 (2010)

    Article  Google Scholar 

  12. Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)

    Article  Google Scholar 

  13. Gospodnetic, O., Hatcher, E.: Lucene in Action (In Action series). Manning Publications (December 2004)

    Google Scholar 

  14. Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 2009. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD Conference, p. 607 (2001)

    Google Scholar 

  16. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 850–861. VLDB Endowment (2003)

    Google Scholar 

  17. Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 670–681. VLDB Endowment (2002)

    Google Scholar 

  18. Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 847–860. ACM, New York (2008)

    Chapter  Google Scholar 

  19. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 154–161. ACM, New York (2005)

    Chapter  Google Scholar 

  20. Kelly, D., Belkin, N.J.: Display time as implicit feedback: understanding task effects. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 377–384. ACM (2004)

    Google Scholar 

  21. Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2), 18–28 (2003)

    Article  Google Scholar 

  22. Lange, M., Spies, K., Bargsten, J., Haberhauer, G., Klapperstück, M., Leps, M., Weinel, C., Wünschiers, R., Weissbach, M., Stein, J., Scholz, U.: The LAILAPS search engine: relevance ranking in life science databases. J. Integr. Bioinform. 7(2), 110 (2010)

    Google Scholar 

  23. Lu, Z., Kim, W., John Wilbur, W.: Evaluating relevance ranking strategies for MEDLINE retrieval. Journal of the American Medical Informatics Association: JAMIA 16(1), 32–36 (2009)

    Article  Google Scholar 

  24. Madhavan, J., Jeffery, S.F., Cohen, S., Dong, X., Ko, D., Yu, C., Halevy, A.: Web-scale data integration: You can only afford to pay as you go. In: Proceedings of CIDR, pp. 342–350 (2007)

    Google Scholar 

  25. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  26. Bullock, B.N., Jäschke, R., Hotho, A.: Tagging data as implicit feedback for learning-to-rank. In: Proceedings of the ACM WebSci 2011, Koblenz, Germany, June 14-17, pp. 1–4 (2011)

    Google Scholar 

  27. Oard, D.W., Kim, J.: Modeling information content using observable behavior. Science, 481–488 (2001)

    Google Scholar 

  28. Parkinson, H.E., et al.: Arrayexpress update - an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Research 39(Database-Issue), 1002–1004 (2011)

    Article  Google Scholar 

  29. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10, 334–350 (2001)

    Article  MATH  Google Scholar 

  30. Salton, G., Waldstein, R.K.: Term relevance weights in on-line information retrieval. Information Processing and Management 14(1), 29–35 (1978)

    Article  Google Scholar 

  31. Schlieder, T., Meuss, H.: Querying and ranking xml documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)

    Article  Google Scholar 

  32. Sugiyama, K., Hatano, K., Yoshikawa, M., Uemura, S.: Refinement of tf-idf schemes for web pages using their hyperlinked neighboring pages. In: Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia, HYPERTEXT 2003, pp. 198–207. ACM, New York (2003)

    Chapter  Google Scholar 

  33. Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. Proc. VLDB Endow. 1(1), 785–796 (2008)

    Google Scholar 

  34. Xu, L., Embley, D.W.: A composite approach to automating direct and indirect schema mappings. Inf. Syst. 31(8), 697–732 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maskat, R., Paton, N.W., Embury, S.M. (2012). Pay-as-You-Go Ranking of Schema Mappings Using Query Logs. In: Bodenreider, O., Rance, B. (eds) Data Integration in the Life Sciences. DILS 2012. Lecture Notes in Computer Science(), vol 7348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31040-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31040-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31039-3

  • Online ISBN: 978-3-642-31040-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics