Matching Techniques for Data Integration and Exploration: From Databases to Big Data

  • Silvana Castano
  • Alfio Ferrara
  • Stefano MontanelliEmail author
Part of the Studies in Big Data book series (SBD, volume 31)


In the last two decades, data matching has been addressed for different purposes and in different application contexts, ranging from data integration, to ontology evolution, to semantic data clouding, until more recent exploratory data analysis over large/big datasets. This paper describes the evolution of research activity on matching techniques for data integration and exploration at the ISLab group of the Università degli Studi di Milano. We analyze the matching techniques according to the structure of target data, the algorithmic pattern of the matching process, and the application focus, and we discuss the results of using our techniques for exploratory analysis of a real dataset composed by all the SEBD proceedings publications in the timeframe 1993–2016.


Matching techniques Data integration Data exploration Big data 


  1. 1.
    C.C. Aggarwal, S.Y. Philip, On clustering massive text and categorical data streams. Knowl. Inf. Syst. 24(2), 171–196 (2010)CrossRefGoogle Scholar
  2. 2.
    P. Berkhin, Grouping multidimensional data, A Survey of Clustering Data Mining Techniques (Springer, Berlin, 2006)CrossRefGoogle Scholar
  3. 3.
    D.M. Blei, A.Y. Ng, M.I. Jordan, Latent dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)zbMATHGoogle Scholar
  4. 4.
    S. Castano, V. De Antonellis, Global viewing of heterogeneous data sources. IEEE Trans. Knowl. Data Eng. 13(2), 277–297 (2001)CrossRefGoogle Scholar
  5. 5.
    S. Castano, A. Ferrara, S. Montanelli, Matching ontologies in open networked systems: techniques and applications. J. Data Semant. V, 25–63 (2006)Google Scholar
  6. 6.
    S. Castano, A. Ferrara, S. Montanelli, Structured data clouding across multiple webs. Inf. Syst. 37(4), 352–371 (2012)CrossRefGoogle Scholar
  7. 7.
    S. Castano, A. Ferrara, S. Montanelli, Human-in-the-loop web resource classification, in Proceedings of the On the Move to Meaningful Internet Systems: OTM 2016 Conferences (Rhodes, Greece, 2016), pp. 229–244Google Scholar
  8. 8.
    S. Castano, A. Ferrara, S. Montanelli, Exploratory analysis of textual data streams. Future Gener. Comput. Syst. 68, 391–406 (2017)CrossRefGoogle Scholar
  9. 9.
    A. Ferrara, A. Nikolov, F. Scharffe, Data Linking for the Semantic Web. Semantic Web: Ontology and Knowledge Base Enabled Tools, Services, and Applications 169 (2013)Google Scholar
  10. 10.
    A. Ferrara, L. Genta, S. Montanelli, S. Castano, Dimensional clustering of linked data: techniques and applications. Trans. Large-Scale Data- Knowl.-Centered Syst. XIX, 55–86 (2015)MathSciNetGoogle Scholar
  11. 11.
    A.Y. Halevy, Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)CrossRefzbMATHGoogle Scholar
  12. 12.
    A. Halevy, A. Rajaraman, J. Ordille, Data integration: the teenage years, in Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB Endowment (2006), pp. 9–16Google Scholar
  13. 13.
    C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, vol. 1 (Cambridge university press Cambridge, Cambridge, 2008)CrossRefzbMATHGoogle Scholar
  14. 14.
    E. Rahm, P.A. Bernstein, A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    P. Shvaiko, J. Euzenat, A Survey of Schema-based Matching Approaches. J. Data Semant. IV (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Silvana Castano
    • 1
  • Alfio Ferrara
    • 1
  • Stefano Montanelli
    • 1
    Email author
  1. 1.Università Degli Studi di MilanoMilanItaly

Personalised recommendations