From Data Integration to Big Data Integration

  • Sonia Bergamaschi
  • Domenico Beneventano
  • Federica Mandreoli
  • Riccardo Martoglia
  • Francesco Guerra
  • Mirko Orsini
  • Laura Po
  • Maurizio Vincini
  • Giovanni Simonini
  • Song Zhu
  • Luca Gagliardelli
  • Luca Magnotta
Chapter

Abstract

TheDatabaseGroup (DBGroup, www.dbgroup.unimore.it) andInformationSystemGroup (ISGroup, www.isgroup.unimore.it) researchactivitieshavebeenmainly devoted to the Data Integration Reserach Area. The DBGroup designed and developed the MOMIS data integration system, giving raise to a successful innovative enterprise DataRiver (www.datariver.it), distributing MOMIS as open source. MOMIS provides an integrated access to structured and semistructured data sources and allows a user to pose a single query and to receive a single unified answer. Description Logics, Automatic Annotation of schemata plus clustering techniques constitute the theoretical framework. In the context of data integration, the ISGroup addressed problems related to the management and querying of heterogeneous data sources in large-scale and dynamic scenarios. The reference architectures are the Peer Data Management Systems and its evolutions toward dataspaces. In these contexts, the ISGroup proposed and evaluated effective and efficient mechanisms for network creation with limited information loss and solutions for mapping management query reformulation and processing and query routing. The main issues of data integration have been faced: automatic annotation, mapping discovery, global query processing, provenance, multidimensional Information integration, keyword search, within European and national projects. With the incoming new requirements of integrating open linked data, textual and multimedia data in a big data scenario, the research has been devoted to the Big Data Integration Research Area. In particular, the most relevant achieved research results are: a scalable entity resolution method, a scalable join operator and a tool, LODEX, for automatically extracting metadata from Linked Open Data (LOD) resources and for visual querying formulation on LOD resources. Moreover, in collaboration with DATARIVER, Data Integration was successfully applied to smart e-health.

References

  1. 1.
    I. Bartolini, D. Beneventano, S. Bergamaschi, P. Ciaccia, A. Corni, M. Orsini, M. Patella, M.M. Santese, MOMIS goes multimedia: WINDSURF and the case of top-k queries, in SEBD’15, Gaeta, 14–17 June 2015. (2015), pp. 200–207Google Scholar
  2. 2.
    F. Benedetti, S. Bergamaschi, L. Po, Lodex: a tool for visual querying linked open data, in ISWC’15 Posters & Demonstrations Track (2015)Google Scholar
  3. 3.
    F. Benedetti, S. Bergamaschi, L. Po, Visual querying LOD sources with lodex, in K-CAP’15, Palisades, NY, USA, 7-10 Oct 2015 (2015), pp. 12:1–12:8Google Scholar
  4. 4.
    D. Beneventano, Provenance based conflict handling strategies, in DASFAA’12, Busan, South Korea, 15–18 Apr 2012 (2012), pp. 286–297Google Scholar
  5. 5.
    D. Beneventano, S. Bergamaschi, The momis methodology for integrating heterogeneous data sources, in IFIP 18th World Computer Congress 22–27 Aug 2004 Toulouse, France (Springer, US, 2004), pp. 19–24Google Scholar
  6. 6.
    D. Beneventano, S. Bergamaschi, Provenance-aware semantic search engines based on data integration systems. IJOCI 4(2), 1–30 (2014)Google Scholar
  7. 7.
    D. Beneventano, S. Bergamaschi, A.R. Dannaoui, Integration and provenance of cereals genotypic and phenotypic data, in SEBD’12 (2012), pp. 91–98Google Scholar
  8. 8.
    D. Beneventano, S. Bergamaschi, L. Gagliardelli, L. Po, Driving innovation in youth policies with open data, in IC3K’15, Revised Selected Papers, Communications in Computer and Information Science (Springer, 2016)Google Scholar
  9. 9.
    D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini, The SEWASIE network of mediator agents for semantic search. J. UCS 13(12), 1936–1969 (2007)Google Scholar
  10. 10.
    D. Beneventano, S. Bergamaschi, R. Martoglia, Exploiting semantics for searching agricultural bibliographic data. J. of Inf. Sci. 42(6), 748–762 (2016)CrossRefGoogle Scholar
  11. 11.
    D. Beneventano, S. Bergamaschi, S. Sorrentino, M. Vincini, F. Benedetti, Semantic annotation of the CEREALAB database by the AGROVOC linked dataset. Ecol. Inf. 26(2), 119–126 (2015)CrossRefGoogle Scholar
  12. 12.
    D. Beneventano, A.R. Dannaoui, A. Sala, On provenance of data fusion queries, in SEBD’11, 26–29 June 2011 (2011), pp. 84–94Google Scholar
  13. 13.
    D. Beneventano, C. Gennaro, S. Bergamaschi, F. Rabitti, A mediator-based approach for integrating heterogeneous multimedia sources. Multimed. Tools Appl. 62(2), 427–450 (2013)CrossRefGoogle Scholar
  14. 14.
    D. Beneventano, F. Guerra, S. Magnani, M. Vincini, A web service based framework for the semantic mapping amongst product classification schemas. J. Electron. Commer. Res. 5(2), 114–127 (2004)Google Scholar
  15. 15.
    D. Beneventano, F. Guerra, A. Maurino, M. Palmonari, G. Pasi, A. Sala, Unified semantic search of data and services, in MTSR’09 (2009), pp. 95–107Google Scholar
  16. 16.
    D. Beneventano, S.E. Haoum, D. Montanari, Mapping of heterogeneous schemata, business structures, and terminologies, in Workshop at DEXA’07 (2007), pp. 412–418Google Scholar
  17. 17.
    D. Beneventano, M. Olaru, M. Vincini, Analyzing dimension mappings and properties in data warehouse integration, in OTM’13 (2013), pp. 616–623Google Scholar
  18. 18.
    S. Bergamaschi, D. Beneventano, F. Guerra, M. Orsini, Data integration, in Handbook of Conceptual Modeling: Theory, Practice and Research Challenges, ed. By D.W. Embley, B. Thalheim (Springer, 2011)Google Scholar
  19. 19.
    S. Bergamaschi, D. Beneventano, F. Guerra, M. Vincini, Building a tourism information provider with the MOMIS system. J. Inf. Technol. Tour. 7(3–4), 221–238 (2004)Google Scholar
  20. 20.
    S. Bergamaschi, S. Castano, M. Vincini, Semantic integration of semistructured and structured data sources. SIGMOD Rec. 28(1) (1999)Google Scholar
  21. 21.
    S. Bergamaschi, E. Domnori, F. Guerra, M. Orsini, R. Trillo-Lado, Y. Velegrakis, Keymantic: semantic keyword-based searching in data integration systems. PVLDB 3(2) (2010)Google Scholar
  22. 22.
    S. Bergamaschi, E. Domnori, F. Guerra, R. Trillo-Lado, Y. Velegrakis, Keyword search over relational databases: a metadata approach, in SIGMOD (ACM, 2011), pp. 565–576Google Scholar
  23. 23.
    S. Bergamaschi, D. Ferrari, F. Guerra, G. Simonini, Y. Velegrakis, Providing insight into data source topics. J. Data Semant. 5(4), 211–228 (2016)CrossRefGoogle Scholar
  24. 24.
    S. Bergamaschi, N. Ferro, F. Guerra, G. Silvello, Keyword-based search over databases: a roadmap for a reference architecture paired with an evaluation framework. Trans. Comput. Collect. Intell. 21, 1–20 (2016)Google Scholar
  25. 25.
    S. Bergamaschi, F. Guerra, M. Interlandi, R.T. Lado, Y. Velegrakis, QUEST: a keyword search system for relational data based on semantic and machine learning techniques. PVLDB 6(12), 1222–1225 (2013)Google Scholar
  26. 26.
    S. Bergamaschi, F. Guerra, M. Interlandi, R.T. Lado, Y. Velegrakis, Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst. 55, 1–19 (2016)CrossRefGoogle Scholar
  27. 27.
    S. Bergamaschi, F. Guerra, S. Rota, Y. Velegrakis, A hidden markov model approach to keyword-based search over relational databases, in ER, vol. 6998 (LNCS, Springer, 2011), pp. 411–420Google Scholar
  28. 28.
    S. Bergamaschi, L. Po, S. Sorrentino, Automatic annotation for mapping discovery in integration systems, in SEBD’08 (2008), pp. 334–341Google Scholar
  29. 29.
    J. Bleiholder, F. Naumann, Data fusion. ACM Comp. Surv. 41, 1–41 (2008)CrossRefGoogle Scholar
  30. 30.
    G.H.L. Fletcher, F. Mandreoli, No users no dataspaces! query-driven dataspace orchestration? in Proceedings of SEBD (2016), pp. 150–157Google Scholar
  31. 31.
    B. Glavic, G. Alonso, R.J. Miller, L.M. Haas, Tramp: Understanding the behavior of schema mappings through provenance. PVLDB 3(1), 1314–1325 (2010)Google Scholar
  32. 32.
    M. Golfarelli, F. Mandreoli, W. Penzo, S. Rizzi, E. Turricchia, Towards OLAP query reformulation in peer-to-peer data warehousing, in Proceedings of ACM (DOLAP) (2010), pp. 37–44Google Scholar
  33. 33.
    A.Y. Halevy, M.J. Franklin, D. Maier, Principles of dataspace systems, in ACM PODS (2006), pp. 1–9Google Scholar
  34. 34.
    A.Y. Halevy, Z.G. Ives, D. Suciu, I. Tatarinov, Schema mediation for large-scale semantic data sharing. VLDB J. 14(1), 68–83 (2005)CrossRefGoogle Scholar
  35. 35.
    J. Hammer, M. Stonebraker, O. Topsakal, Thalia: test harness for the assessment of legacy information integration, in ICDE (2005), pp. 485–486Google Scholar
  36. 36.
    M. Lenzerini, Data integration: a theoretical perspective, in PODS (2002), pp. 233–246Google Scholar
  37. 37.
    R. Lenzi, C. Gennaro, F. Mandreoli, R. Martoglia, M. Mordacchini, W. Penzo, S. Sassatelli, A unified multimedia and semantic perspective for data retrieval in the semantic web. Inf. Syst. 36(2), 174–191 (2011)CrossRefGoogle Scholar
  38. 38.
    J.N. Levi, The Syntax and Semantics of Complex Nominals(Academic Press, Cambridge, 1978)Google Scholar
  39. 39.
    F. Mandreoli, R. Martoglia, Knowledge-based sense disambiguation (almost) for all structures. Inf. Syst. 36(2), 406–430 (2011)CrossRefGoogle Scholar
  40. 40.
    F. Mandreoli, R. Martoglia, W. Penzo, Approximating expressive queries on graph-modeled data: the gex approach. J. Syst. Softw. 2015(109), 106–123 (2015)CrossRefGoogle Scholar
  41. 41.
    F. Mandreoli, R. Martoglia, W. Penzo, S. Sassatelli, Data-sharing p2p networks with semantic approximation capabilities. IEEE IC 13(5), 60–70 (2009)MATHGoogle Scholar
  42. 42.
    F. Mandreoli, R. Martoglia, W. Penzo, S. Sassatelli, G. Villani, Sri@work: efficient and effective routing strategies in a pdms, in WISE (2007), pp. 285–297Google Scholar
  43. 43.
    F. Mandreoli, R. Martoglia, W. Penzo, S. Sassatelli, G. Villani, Building a pdms infrastructure for xml data sharing with sunrise, in EDBT-DATAX (2008)Google Scholar
  44. 44.
    F. Mandreoli, R. Martoglia, W. Penzo, G. Villani, Flexible query answering on graph-modeled data. Proc. EDBT 2009, 216–227 (2009)CrossRefGoogle Scholar
  45. 45.
    F. Mandreoli, R. Martoglia, E. Ronchetti, Versatile structural disambiguation for semantic-aware applications, in Proceedings of ACM CIKM (2005), pp. 209–216Google Scholar
  46. 46.
    F. Mandreoli, R. Martoglia, E. Ronchetti, Strider: a versatile system for structural disambiguation. Proc. EDBT 2006, 1194–1197 (2006)Google Scholar
  47. 47.
    F. Mandreoli, R. Martoglia, S. Sassatelli, W. Penzo, Sri: exploiting semantic information for effective query routing in a pdms, in Proceedings of of the ACM CIKM Workshop WIDM (2006), pp. 19–26Google Scholar
  48. 48.
    F. Mandreoli, W. Penzo, S. Rizzi, M. Golfarelli, E. Turricchia, Olap query reformulation in peer-to-peer data warehousing. Inf. Syst. 37(5), 393–411 (2012)CrossRefGoogle Scholar
  49. 49.
    F. Mandreoli, W. Penzo, S. Sassatelli, S. Lodi, R. Martoglia, Semantic peer, here are the neighbors you want!. Proc. EDBT 2008, 26–37 (2008)Google Scholar
  50. 50.
    J. Milc, A. Sala, S. Bergamaschi, N. Pecchioni, A genotypic and phenotypic information source: the cerealab database. Database (2011)Google Scholar
  51. 51.
    G.A. Miller, Wordnet: a lexical database for english. C. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  52. 52.
    R.J. Miller, D. Fisla, M. Huang, F. Kymlicka, V. Lee, The amalgam schema and data integration test suite (2001), www.cs.toronto.edu/~miller/amalgam
  53. 53.
    S. Rota, S. Bergamaschi, F. Guerra, The list viterbi training algorithm and its application to keyword search over databases, in CIKM (2011), pp. 1601–1606Google Scholar
  54. 54.
    G. Simonini, S. Bergamaschi, Enhancing Entity Resolution Efficiency with Loosely Schema-Aware Techniques (2016), pp. 270–277Google Scholar
  55. 55.
    G. Simonini, S. Bergamaschi, H.V. Jagadish, BLAST: a loosely schema-aware meta-blocking approach for entity resolution. PVLDB 9(12), 1173–1184 (2016)Google Scholar
  56. 56.
    S. Sorrentino, S. Bergamaschi, E. Fusari, D. Beneventano, Semantic annotation and publication of linked open data. Comput. Sci. Appl. - ICCSA 2013, 462–474 (2013)Google Scholar
  57. 57.
    S. Sorrentino, S. Bergamaschi, M. Gawinecki, NORMS: an automatic tool to perform schema label normalization, in ICDE’11 (2011), pp. 1344–1347Google Scholar
  58. 58.
    S. Sorrentino, S. Bergamaschi, M. Gawinecki, L. Po, Schema label normalization for improving schema matching. DKE 69(12), 1254–1273 (2010)CrossRefGoogle Scholar
  59. 59.
    M. Vincini, D. Beneventano, S. Bergamaschi, Semantic integration of heterogeneous data sources in the momis data transformation system. J. UCS - J. Univers. Comput. Sci. 19(13), 1986–2012 (2013)Google Scholar
  60. 60.
    G. Wiederhold, Intelligent integration of information, in SIGMOD’93, Washington, D.C., 26–28 May 1993 (ACM Press, 1993), pp. 434–437Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Sonia Bergamaschi
    • 1
  • Domenico Beneventano
    • 1
  • Federica Mandreoli
    • 3
  • Riccardo Martoglia
    • 3
  • Francesco Guerra
    • 1
  • Mirko Orsini
    • 2
  • Laura Po
    • 1
  • Maurizio Vincini
    • 1
  • Giovanni Simonini
    • 1
  • Song Zhu
    • 4
  • Luca Gagliardelli
    • 4
  • Luca Magnotta
    • 2
    • 4
  1. 1.Dipartimento di Ingegneria Enzo FerrariUniversità di Modena e Reggio EmiliaModenaItaly
  2. 2.Datariver s.r.l.ModenaItaly
  3. 3.FIMUniversità di Modena e ReggioModenaItaly
  4. 4.ICT SchoolUniversità di Modena e ReggioModenaItaly

Personalised recommendations