TupleRank: Ranking Discovered Content in Virtual Databases

  • Jacob Berlin
  • Amihai Motro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4032)


Recently, the problem of data integration has been newly addressed by methods based on machine learning and discovery. Such methods are intended to automate, at least in part, the laborious process of information integration, by which existing data sources are incorporated in a virtual database. Essentially, these methods scan new data sources, attempting to discover possible mappings to the virtual database. Like all discovery processes, this process is intrinsically probabilistic; that is, each discovery is associated with a specific value that denotes assurance of its appropriateness. Consequently, the rows in a discovered virtual table have mixed assurance levels, with some rows being more credible than others. We argue that rows in discovered virtual databases should be ranked, and we describe a ranking method, called TupleRank, for calculating such a ranking order. Roughly speaking, TupleRank calibrates the probabilities calculated during a discovery process with historical information about the performance of the system. The work is done in the framework of the Autoplex system for discovering content for virtual databases, and initial experimentation is reported and discussed.


Discovery Process Assurance Score Assurance Measurement Internet Search Engine Constraint Checker 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley/ACM Press (1999)Google Scholar
  2. 2.
    Berlin, J., Motro, A.: Autoplex: Automated Discovery of Content for Virtual Databases. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 108–122. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Berlin, J., Motro, A.: Database Schema Matching Using Machine Learning with Feature Selection. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 452–466. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Castano, S., De Antonellis, V.: A schema analysis and reconciliation tool environment for heterogeneous databases. In: Proc. IDEAS 1999, Int. Database Engineering and Applications Symposium, pp. 53–62 (1999)Google Scholar
  5. 5.
    Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.: iMAP: Discovering complex semantic matches between database schemas. In: Proc. SIGMOD 2004, Int. Conf. on Management of Data, pp. 383–394 (2004)Google Scholar
  6. 6.
    Doan, A., Domingos, P., Halevy, A.Y.: Learning source description for data integration. In: Proc. WebDB, pp. 81–86 (2000)Google Scholar
  7. 7.
    Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Proc. SIGMOD 2001, Int. Conf. on Management of Data, pp. 509–520 (2001)Google Scholar
  8. 8.
    Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J.D., Vassalos, V., Widom, J.: The TSIMMIS approach to mediation: Data models and languages. J. Intelligent Information Systems 8(2), 117–132 (1997)CrossRefGoogle Scholar
  9. 9.
    Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering 33(1), 49–84 (2000)MATHCrossRefGoogle Scholar
  10. 10.
    Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with Cupid. In: Proc. VLDB 2001, 27th Int. Conf. on Very Large Databases, pp. 49–58 (2001)Google Scholar
  11. 11.
    Motro, A.: Multiplex: A formal model for multidatabases and its implementation. In: Tsur, S. (ed.) NGITS 1999. LNCS, vol. 1649, pp. 138–158. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  12. 12.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jacob Berlin
    • 1
  • Amihai Motro
    • 1
  1. 1.Information and Software Engineering DepartmentGeorge Mason UniversityFairfaxUSA

Personalised recommendations