Automatically Mapping and Integrating Multiple Data Entry Forms into a Database

  • Yuan An
  • Ritu Khare
  • Il-Yeol Song
  • Xiaohua Hu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6998)

Abstract

Forms are a standard way of gathering data into a database. Many applications need to support multiple users with evolving data gathering requirements. It is desirable to automatically link dynamic forms to the back-end database. We have developed the FormMapper system, a fully automatic solution that accepts user-created data entry forms, and maps and integrates them into an existing database in the same domain. The solution comprises of two components: tree extraction and form integration. The tree extraction component leverages a probabilistic process, Hidden Markov Model (HMM), for automatically extracting a semantic tree structure of a form. In the form integration component, we develop a merging procedure that maps and integrates a tree into an existing database and extends the database with desired properties. We conducted experiments evaluating the performance of the system on several large databases designed from a number of complex forms. Our experimental results show that the FormMapper system is promising: It generated databases that are highly similar (87% overlapped) to those generated by the human experts, given the same set of forms.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Zoho Creator, http://creator.zoho.com
  5. 5.
    An, Y., Borgida, A., Miller, R.J., Mylopoulos, J.: A Semantic Approach to Discovering Schema Mapping Expressions. In: ICDE 2007, pp. 206–215 (2007)Google Scholar
  6. 6.
    Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD (2005)Google Scholar
  7. 7.
    Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of Methodologies for database schema integration. ACM Computing Surveys 18(4), 323–364 (1986)CrossRefGoogle Scholar
  8. 8.
    Bellahsene, Z., Bonifati, A., Rahm, E. (eds.): Schema Matching and Mapping (Data-Centric Systems and Applications). Springer, Heidelberg (2011)Google Scholar
  9. 9.
    Buneman, P., Davidson, S.B., Kosky, A.: Theoretical aspects of schema merging. In: Pirotte, A., Delobel, C., Gottlob, G. (eds.) EDBT 1992. LNCS, vol. 580, pp. 152–167. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  10. 10.
    Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-automatic schema integration in clio. In: VLDB, pp. 1326–1329 (2007)Google Scholar
  11. 11.
    Choobineh, J., Mannino, M.V., Tseng, V.P.: A form-based approach for database analysis and design. Commun. ACM 35(2), 108–120 (1992)CrossRefGoogle Scholar
  12. 12.
    Dragut, E.C., Kabisch, T., Yu, C.T., Leser, U.: A hierarchical approach to model web query interfaces for web source integration. PVLDB 2(1), 325–336 (2009)Google Scholar
  13. 13.
    Jagadish, H.V., Chapman, A., Elkiss, A., Jayapandian, M., Li, Y., Nandi, A., Yu, C.: Making database systems usable. In: SIGMOD 2007, pp. 13–24. ACM, New York (2007)Google Scholar
  14. 14.
    Jayapandian, M., Jagadish, H.V.: Automated creation of a forms-based database query interface. Proc. VLDB Endow. 1(1), 695–709 (2008)CrossRefGoogle Scholar
  15. 15.
    Khare, R., An, Y.: An empirical study on using hidden markov model for search interface segmentation. In: Proceedings of 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 17–26 (2009)Google Scholar
  16. 16.
    Khare, R., An, Y., Hu, X., Song, I.-Y.: Can clinician create high-quality databases? a study on a flexible electronic health record (fehr) system. In: The Proceedings of the 1st ACM Health Informatics Symposium (IHI 2010), Washington, DC, USA (2010)Google Scholar
  17. 17.
    Khare, R., An, Y., Song, I.-Y.: Understanding search interfaces: A survey. SIGMOD Record 39(1), 33–40 (2010)CrossRefGoogle Scholar
  18. 18.
    Kowalczykowski, K., Ong, K.W., Zhao, K.K., Deutsch, A., Papakonstantinou, Y., Petropoulos, M.: Do-it-yourself custom forms-driven workflow applications. In: CIDR 2009 (2009)Google Scholar
  19. 19.
    Luković, I., Mogin, P., Pavićević, J., Ristić, S.: An approach to developing complex database schemas using form types. Softw. Pract. Exper. 37(15), 1621–1656 (2007)CrossRefGoogle Scholar
  20. 20.
    Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB 2001, pp. 49–58 (2001)Google Scholar
  21. 21.
    Miller, R.J., Haas, L.M., Hernandez, M.A.: Schema Mapping as Query Discovery. In: VLDB, pp. 77–88 (2000)Google Scholar
  22. 22.
    Pavicevic, J., Lukovic, I., Mogin, P., Govedarica, M.: Information system design and prototyping using form types. In: ICSOFT (2), pp. 157–160 (2006)Google Scholar
  23. 23.
    Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating web data. In: VLDB, pp. 598–609 (2002)Google Scholar
  24. 24.
    Pottinger, R., Bernstein, P.A.: Merging models based on given correspondences. In: VLDB, pp. 826–873 (2003)Google Scholar
  25. 25.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)Google Scholar
  26. 26.
    Rahm, E., Bernstein, P.: An on-line bibliography on schema evolution. SIGMOD Record 35(4), 30–31 (2006)CrossRefGoogle Scholar
  27. 27.
    Wu, W., Yu, C., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep web. In: SIGMOD 2004, pp. 95–106. ACM, New York (2004)Google Scholar
  28. 28.
    Yang, F., Gupta, N., Botev, C., Churchill, E.F., Levchenko, G., Shanmugasundaram, J.: Wysiwyg development of data driven web applications. Proc. VLDB Endow. 1(1), 163–175 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yuan An
    • 1
  • Ritu Khare
    • 1
  • Il-Yeol Song
    • 1
  • Xiaohua Hu
    • 1
  1. 1.College of Information Science and TechnologyDrexel UniversityUSA

Personalised recommendations