Autoplex: Automated Discovery of Content for Virtual Databases

  • Jacob Berlin
  • Amihai Motro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2172)

Abstract

Most virtual database systems are suitable for environments in which the set of member information sources is small and stable. Consequently, present virtual database systems do not scale up very well. The main reason is the complexity and cost of incorporating new information sources into the virtual database. In this paper we describe a system, called Autoplex, which uses machine learning techniques for automating the discovery of new content for virtual database systems. Autoplex assumes that several information sources have already been incorporated (“mapped”) into the virtual database system by human experts (as done in standard virtual database systems). Autoplex learns the features of these examples. It then applies this knowledge to new candidate sources, trying to infer views that “resemble” the examples. In this paper we report initial results from the Autoplex project.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Abiteboul, S. Cluet, and T. Milo. Correspondence and translation for heterogeneous data. Proc ICDT, pages 351–363, 1997. 110Google Scholar
  2. 2.
    R. Ahmed, P. De Smedt, W. Du, W. Kent, M. A. Ketabchi, W. A. Litwin, A. Rafii, and M. C. Shan The Pegasus heterogeneous multidatabase system. IEEE Computer, 24(12):19–27, 1991. 108Google Scholar
  3. 3.
    Algorithmic Solutions. The LEDA Users Manual (Version 4.2.1), 2001. 117Google Scholar
  4. 4.
    Y. Arens. Query Reformulation for dynamic information integration. Journal of Intelligent Information Systems, 6:99–130, 1996. 109CrossRefGoogle Scholar
  5. 5.
    C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. Computing Surveys, 18(4):323–364, Dec 1986. 108CrossRefGoogle Scholar
  6. 6.
    J. Berlin and A. Motro. Autoplex: Automated discovery of content for virtual databases. Technical Report ISE-TR-00-04, George Mason University, August 2000. 110Google Scholar
  7. 7.
    U. Dayal and H. Hwang. View definition and generalization for database integration in a mutlidatabase system. IEEE ToSE, SE-10(6):628–644, November 1984. 108Google Scholar
  8. 8.
    P. Domingos and M. Pazzani. Conditions for the optimality of the simple Bayesian classifier. Proc ICML, pages 105–112, 1996. 113Google Scholar
  9. 9.
    Z. Galil. Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys, 18(1):23–38, March 1986. 116MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    D. Heimbigner and D. McLeod. A federated architecture for information management. ACM ToIS,3(3):253–278, July 1985. 108CrossRefGoogle Scholar
  11. 11.
    Y. Kambayashi, M. Rusinkiewicz, and A. Sheth, editors. Proc RIDE-IMS, 1991. 108Google Scholar
  12. 12.
    P. Langley, W. Iba, and K. Thompson. An Analysis of Bayesian Classifiers. Proc of the Tenth National Conference on AI, pages 223–228, 1992. 113Google Scholar
  13. 13.
    A. Levy. Data model and query evaluation in global information systems. Journal of Intelligent Information Systems, 5(2):121–143, September 1995. 109CrossRefGoogle Scholar
  14. 14.
    A. Levy, C. Knoblock, S. Minton, and W. Cohen. Information integration. IEEE Intelligent Systems, 13(5):12–24, 1998. 109CrossRefGoogle Scholar
  15. 15.
    W-S. Li and C. Clifton. Semantic integration in heterogeneous databases using neural networks. In Proc VLDB, pages 1–12, 1994. 109Google Scholar
  16. 16.
    W. Litwin. MALPHA: A relational multidatabase manipulation language. In Proc ICDE, pages 86–93, 1984, 108Google Scholar
  17. 17.
    A. G. Merten and J. P. Fry. A data description language approach to file translation. In Proc of ACM-SIGFIDET, 1974. 110Google Scholar
  18. 18.
    R. Miller, L. Haas, and M. Hernàndez. Schema mapping as query discovery. Proc VLDB, pages 77–88, 2000. 110Google Scholar
  19. 19.
    T. Milo and S. Zohar. Using schema matching to simplify heterogeneous data translation. Proc VLDB, pages 122–133, 1998. 110Google Scholar
  20. 20.
    T. Mitchell. Machine Learning. McGraw-Hill, 1997. 113, 114Google Scholar
  21. 21.
    A. Motro. Superviews: Virtual integration of multiple databases. IEEE Transactions on Software Engineering, SE-13(7):785–798, July 1987. 108CrossRefGoogle Scholar
  22. 22.
    A. Motro. Multiplex: a formal model for multidatabases and its implementation. Proc NGITS, pages 138–158, 1999. 109, 110Google Scholar
  23. 23.
    S. B. Navathe and J. P. Fry. Restructuring for large databases: three levels of abstraction. ACM ToDS, 1(2), June 1976. 110Google Scholar
  24. 24.
    J. A. Ramirez, N. A. Rin, and N. S. Prywes. Automatic generation of data conversion programs using a data description language. In Proc ACM-SIGFIDET, 1974. 110Google Scholar
  25. 25.
    E. A. Rundensteiner, A. Koeller, and X. Zhang. Maintaining data warehouses over changing information sources. Communications of the ACM, 43(6):57–62, 2000. 109CrossRefGoogle Scholar
  26. 26.
    P. Scheuermann, C. Yu, A. Elmagarmid, H. Garcia-Molina, F. Manola, D. McLeod, A. Rosenthal, and M. Templeton. Report on the workshop on heterogeneous database systems. SIGMOD Record, 19(4):23–31, December 1990. 108CrossRefGoogle Scholar
  27. 27.
    A. P. Sheth and J. A. Larson. Federated database systems for managing distributed, heterogeneous and autonomous databases. Computing Surveys, 22(3):183–236, Sep 1990. 108CrossRefGoogle Scholar
  28. 28.
    I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000. 118, 119Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Jacob Berlin
    • 1
  • Amihai Motro
    • 1
  1. 1.Information and Software Engineering DepartmentGeorge Mason UniversityFairfax

Personalised recommendations