Abstract
In this paper we propose a solution to the similarity measuring for heterogenous data. The key idea is to consider the similarity of a given attribute-value pair as the probability of picking randomly a value pair that is less similar than or equally similar in terms of order relations defined appropriately for data types. Similarities of attribute value pairs are then integrated into similarities between data objects using a statistical method. Applying our method in combination with distance-based clustering to real data shows the merit of our proposed method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Gowda, K.C., Diday, E.: Symbolic clustering using a new dissimilarity measure. Pattern Recognition 24(6), 567–578 (1991)
Gowda, K.C., Diday, E.: Unsuppervised learning throught symbolic clustering. Pattern Recognition lett. 12, 259–264 (1991)
Gowda, K.C., Diday, E.: Symbolic clustering using a new similarity measure. IEEE Trans. Syst. Man Cybernet 22(2), 368–378 (1992)
Ichino, M., Yaguchi, H.: Generalized minkowski metrics for mixed feature-type data analysis. IEEE Transactions on Systems Man, and Cybernetics 24(4) (1994)
de Carvalho, F.A.T.: Proximity coefficients between boolean symbolic objects. In: Diday, E., et al. (eds.) New Approaches in Classification and Data Analysis. Studies in Classification, DataAnalysis, and Knowledge Organisation, vol. 5, pp. 387–394. Springer, Berlin (1994)
de Carvalho, F.A.T.: Extension based proximity coefficients between constrained boolean symbolicobjects. In: Hayashi, C., et al. (eds.) IFCS 1996, pp. 370–378. Springer, Berlin (1996)
Geist, S., Lengnink, K., Wille, R.: An order-theoretic foundation for similarity measures. In: Diday, E., Lechevallier, Y. (eds.) Ordinal and symbolic data analysis. studies in classification, data analysis, and knowledge organization, pp. 225–237. Springer, Heidelberg (1996)
Fisher, R.A.: Statistical methods for research workers, 11th edn. Oliver and Boyd (1950)
Stouffer, S.A., Suchman, E.A., Devinney, L.C., Williams, R.M.: Adjustment during army life. The American Solder, 1 (1949)
Mudholkar, G.S., George, E.O.: The logit method for combining probabilities. In: Rustagi, J. (ed.) Symposium on Optimizing methods in statistics, pp. 345–366. Academic Press, London (1979)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press and McGraw-Hill (2002)
MacQueen, J.: Some methods for classification and analysis of multivariate observation. In: Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Kaufmann, L., Rousseeuw, P.J.: Clustering by means of medoids. Statistical Data Analysis based on the L1 Norm, 405–416 (1987)
Sneath, P.H.A.: The application of computers to taxonomy. Journal of general microbiology 17, 201–226 (1957)
McQuitty, L.L.: Hierarchical linkage analysis for the isolation of types. Education and Psychological measurements 20, 55–67 (1960)
Sokal, R.R., Michener, C.D.: Statistical method for evaluating systematic relationships. University of Kansas science bulletin 38, 1409–1438 (1958)
McQuitty, L.L.: Expansion of similarity analysis by reciprocal pairs for discrete and continuous data. Education and Psychological measurements 27, 253–255 (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Le, S., Ho, T. (2004). Measuring the Similarity for Heterogenous Data: An Ordered Probability-Based Approach. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-30214-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23357-2
Online ISBN: 978-3-540-30214-8
eBook Packages: Springer Book Archive