Efficient Clustering for Orders

  • Toshihiro Kamishima
  • Shotaro Akaho
Part of the Studies in Computational Intelligence book series (SCI, volume 165)


Lists of ordered objects are widely used as representational forms. Such ordered objects include Web search results or best-seller lists. Clustering is a useful data analysis technique for grouping mutually similar objects. To cluster orders, hierarchical clustering methods have been used together with dissimilarities defined between pairs of orders. However, hierarchical clustering methods cannot be applied to large-scale data due to their computational cost in terms of the number of orders. To avoid this problem, we developed an k-o’means algorithm. This algorithm successfully extracted grouping structures in orders, and was computationally efficient with respect to the number of orders. However, it was not efficient in cases where there are too many possible objects yet. We therefore propose a new method (k-o’means-EBC), grounded on a theory of order statistics. We further propose several techniques to analyze acquired clusters of orders.


Ranking Method Hierarchical Cluster Method Object Pair Borda Count Sample Order 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Luaces, O., Bayón, G.F., Quevedo, J.R., Díez, J., del Coz, J.J., Bahamonde, A.: Analyzing sensory data using non-linear preference learning with feature subset selection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 286–297. Springer, Heidelberg (2004)Google Scholar
  2. 2.
    Fujibuchi, W., Kiseleva, L., Horton, P.: Searching for similar gene expression profiles across platforms. In: Proc. of the 16th Int’l Conf. on Genome Informatics, p. 143 (2005)Google Scholar
  3. 3.
    Everitt, B.S.: Cluster Analysis, 3rd edn. Edward Arnold (1993)Google Scholar
  4. 4.
    Marden, J.I.: Analyzing and Modeling Rank Data. Monographs on Statistics and Applied Probability, vol. 64. Chapman & Hall, Boca Raton (1995)zbMATHGoogle Scholar
  5. 5.
    Branting, L.K., Broos, P.S.: Automated acquisition of user preference. Int’l Journal of Human-Computer Studies 46, 55–77 (1997)CrossRefGoogle Scholar
  6. 6.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of The 8th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 133–142 (2002)Google Scholar
  7. 7.
    Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Computing 21, 1313–1325 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Kamishima, T., Fujiki, J.: Clustering orders. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 194–207. Springer, Heidelberg (2003)Google Scholar
  9. 9.
    Kendall, M., Gibbons, J.D.: Rank Correlation Methods, 5th edn. Oxford University Press, Oxford (1990)zbMATHGoogle Scholar
  10. 10.
    Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the Web. In: Proc. of The 10th Int’l Conf. on World Wide Web, pp. 613–622 (2001)Google Scholar
  11. 11.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  12. 12.
    Fligner, M.A., Verducci, J.S.: Distance based ranking models. Journal of The Royal Statistical Society (B) 48(3), 359–369 (1986)zbMATHMathSciNetGoogle Scholar
  13. 13.
    Thurstone, L.L.: A law of comparative judgment. Psychological Review 34, 273–286 (1927)CrossRefGoogle Scholar
  14. 14.
    Mosteller, F.: Remarks on the method of paired comparisons: I — the least squares solution assuming equal standard deviations and equal correlations. Psychometrika 16(1), 3–9 (1951)CrossRefGoogle Scholar
  15. 15.
    de Borda, J.C.: On elections by ballot (1784). In: McLean, I., Urken, A.B. (eds.) Classics of Social Choice, pp. 81–89. The University of Michigan Press (1995)Google Scholar
  16. 16.
    Mallows, C.L.: Non-null ranking models. I. Biometrika 44, 114–130 (1957)zbMATHMathSciNetGoogle Scholar
  17. 17.
    Arnold, B.C., Balakrishnan, N., Nagaraja, H.N.: A First Course in Order Statistics. John Wiley & Sons, Inc., Chichester (1992)zbMATHGoogle Scholar
  18. 18.
    Kamishima, T., Motoyoshi, F.: Learning from cluster examples. Machine Learning 53, 199–233 (2003)zbMATHCrossRefGoogle Scholar
  19. 19.
    Kamishima, T.: Nantonac collaborative filtering: Recommendation based on order responses. In: Proc. of The 9th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 583–588 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Toshihiro Kamishima
    • 1
  • Shotaro Akaho
    • 1
  1. 1.National Institute of Advanced Industrial Science and Technology (AIST)TsukubaJapan

Personalised recommendations