Coupled K-Nearest Centroid Classification for Non-iid Data
Most traditional classification methods assume the independence and identical distribution (iid) of objects, attributes and values. However, real world data, such as multi-agent data and behavioral data, usually contains strong couplings among values, attributes and objects, which greatly challenges existing methods and tools. This work targets the coupling similarities from these three perspectives and designs a novel classification method that applies a weighted K-Nearest Centroid to obtain the coupled similarity for non-iid data. From value and attribute perspectives, coupled similarity serves as a metric for nominal objects, which consider not only intra-coupled similarity within an attribute but also inter-coupled similarity between attributes. From the object perspective, we propose a more effective method that measures the centroid object by connecting all related objects. Extensive experiments on UCI and student data sets reveal that the proposed method outperforms classical methods for higher accuracy, especially in imbalanced data.
KeywordsSupport Vector Machine Classification Task Educational Data Mining Couple Similarity Couple Distance
This work is sponsored by the Australian Research Council Grants (DP1096218, DP0988016, LP100200774, LP0989721), and Australian Research Council Linkage Grant (LP100200774).
- 2.Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243–254 (2008)Google Scholar
- 3.Cao, L.: Non-iidness learning: an overview. Comput. J. 1–18 (2013)Google Scholar
- 6.Cost, S., Salzberg, S.: A weighted nearest neighbor algorithm for learning with symbolic features. Mach. Learn. 10(1), 57–78 (1993)Google Scholar
- 8.Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2007)Google Scholar
- 10.Houle, M.E., Oria, V., Qasim, U.: Active caching for similarity queries based on shared-neighbor information. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 669–678 (2010)Google Scholar
- 11.Hsu, C.-W., Chang, C.-C., Lin, C.-J., et al.: A practical guide to support vector classification (2003)Google Scholar
- 12.Joachims, T.: Making large scale svm learning practical (1999)Google Scholar
- 13.Li, C., Li, H.: A survey of distance metrics for nominal attributes. J. Softw. 5(11), 1262–1269 (2010)Google Scholar
- 15.Teknomo, K.: K-means clustering tutorial. Medicine 100(4), 3 (2006)Google Scholar
- 17.Zhong, S.: Efficient online spherical k-means clustering. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, IJCNN’05, vol. 5, pp. 3180–3185. IEEE (2005)Google Scholar