Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics

Liu, Haishan; Dou, Dejing; Wang, Hao

doi:10.1007/s13740-012-0010-0

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics

Original Article
Published: 04 August 2012

Volume 1, pages 133–145, (2012)
Cite this article

Download PDF

Journal on Data Semantics

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics

Download PDF

Haishan Liu¹,
Dejing Dou¹ &
Hao Wang¹

466 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective simulated annealing algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.

Article PDF

Clustering Problems for More Useful Benchmarking of Optimization Algorithms

Evidence Accumulation in Multiobjective Data Clustering

Blending multiple algorithmic granular components: a recipe for clustering

Article 06 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Bae E, Bailey J, Dong G (2010) A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings. Data Min Knowl Discov 21: 427–471
Article MathSciNet Google Scholar
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (1998) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4): 547–553
Article Google Scholar
Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, New York
MATH Google Scholar
Dhamankar R, Lee Y, Doan A, Halevy A, Domingos P (2004) iMAP: discovering complex semantic matches between database schemas. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data. ACM, New York
Dien J (2010) The ERP PCA Toolkit: an open source program for advanced statistical analysis of event-related potential data. J Neurosci Methods 187(1): 138–145
Article MathSciNet Google Scholar
Doan A, Domingos P, Levy AY (2000) Learning source description for data integration. In: WebDB (Informal Proceedings), pp 81–86
Fred AL, Jain AK (2003) Robust data clustering. In: IEEE Computer Society conference on computer vision and pattern recognition, vol 2, p 128
Frishkoff GA, Frank RM, Rong J, Dou D, Dien J, Halderman LK (2007) A framework to support automated classification and labeling of brain electromagnetic patterns. Comput Intell Neurosci (CIN): Special Issue EEG/MEG Anal Signal Process 7(3): 1–13
Google Scholar
Guyon I, Hur AB, Gunn S, Dror G (2004) Result analysis of the nips 2003 feature selection challenge. Adv Neural Inf Process Syst 17:545–552
Google Scholar
Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, Vanhoutte A (1989) Similarity measures in scientometric research: the Jaccard index versus Salton’s cosine formula. Inf Process Manag 25: 315–318
Article Google Scholar
Holland JH (1992) Adaptation in natural and artificial systems. MIT Press, Cambridge
Google Scholar
Kirkpatrick S, Gelatt Jr CD, Vecchi MP (1987) Readings in computer vision: issues, problems, principles, and paradigms. In: Optimization by simulated annealing. Morgan Kaufmann, San Francisco, pp 606–615
Kong X, Shi X, Yu PS (2011) Multi-label collective classification. In: SDM’11, pp 618–629
Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logistic Q 2: 83–97
Article Google Scholar
Larson JA, Navathe SB, Elmasri R (1989) A theory of attributed equivalence in databases with application to schema integration. IEEE Trans Softw Eng 15: 449–463
Article MATH Google Scholar
Li WS, Clifton C (2000) Semint: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng 33(1):49–84
Google Scholar
Liu H, Dou D (2011) Breaking the deadlock: simultaneously discovering attribute matching and cluster matching with multi-objective simulated annealing. In: Proceedings of the international conference on ontologies, databases and application of semantics (ODBASE), pp 698–715
Liu H, Frishkoff G, Frank R, Dou D (2010) Ontology-based mining of brainwaves: a sequence similarity technique for mapping alternative descriptions of patterns in event related potentials (ERP) data. In: Proceedings of the 14th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 43–54
Liu H, Frishkoff G, Frank R, Dou D (2012) Sharing and integration of cognitive neuroscience data: metric and pattern matching across heterogeneous ERP datasets. Neurocomputing 92: 156–169
Article Google Scholar
Namata GM, Kok S, Getoor L (2011) Collective graph identification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11. ACM, New York, pp 87–95
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10:2001
Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336): 846–850
Article Google Scholar
Sheth AP, Larson JA, Cornelio A, Navathe SB (1988) A tool for integrating conceptual schemas and user views. In: Proceedings of the fourth international conference on data engineering. IEEE Computer Society, Washington, pp 176–183
Suman B (2003) Simulated annealing based multiobjective algorithm and their application for system reliability. Eng Optim 35: 391–416
Article Google Scholar
Suman B, Kumar P (2006) A survey of simulated annealing as a tool for single and multiobjective optimization. J Oper Res Soc 57: 1143–1160
Article MATH Google Scholar
Wick ML, Rohanimanesh K, Schultz K, McCallum A (2008) A unified approach for schema matching, coreference and canonicalization. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08. ACM, New York, pp 722–730
Zitzler E, Thiele L (1998) Multiobjective optimization using evolutionary algorithms—a comparative case study. Springer, Berlin, pp 292–301

Download references

Author information

Authors and Affiliations

Computer and Information Science Department, University of Oregon, Eugene, OR, 97403, USA
Haishan Liu, Dejing Dou & Hao Wang

Authors

Haishan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dejing Dou
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haishan Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Dou, D. & Wang, H. Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics. J Data Semant 1, 133–145 (2012). https://doi.org/10.1007/s13740-012-0010-0

Download citation

Received: 01 September 2011
Revised: 16 June 2012
Accepted: 22 June 2012
Published: 04 August 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s13740-012-0010-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics

Abstract

Article PDF

Similar content being viewed by others

Clustering Problems for More Useful Benchmarking of Optimization Algorithms

Evidence Accumulation in Multiobjective Data Clustering

Blending multiple algorithmic granular components: a recipe for clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics

Abstract

Article PDF

Similar content being viewed by others

Clustering Problems for More Useful Benchmarking of Optimization Algorithms

Evidence Accumulation in Multiobjective Data Clustering

Blending multiple algorithmic granular components: a recipe for clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation