Adaptive Active Classification of Cell Assay Images

  • Nicolas Cebron
  • Michael R. Berthold
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)


Classifying large datasets without any a-priori information poses a problem in many tasks. Especially in the field of bioinformatics, often huge unlabeled datasets have to be explored mostly manually by a biology expert. In this work we consider an application that is motivated by the development of high-throughput microscope screening cameras. These devices are able to produce hundreds of thousands of images per day. We propose a new adaptive active classification scheme which establishes ties between the two opposing concepts of unsupervised clustering of the underlying data and the supervised task of classification. Based on Fuzzy c-means clustering and Learning Vector Quantization, the scheme allows for an initial clustering of large datasets and subsequently for the adjustment of the classification based on a small number of carefully chosen examples. Motivated by the concept of active learning, the learner tries to query the most informative examples in the learning process and therefore keeps the costs for supervision at a low level. We compare our approach to Learning Vector Quantization with random selection and Support Vector Machines with Active Learning on several datasets.


  1. 1.
    Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2004) (2004)Google Scholar
  2. 2.
    Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)zbMATHGoogle Scholar
  3. 3.
    Cohn, D., Ghahramani, Z., Jordan, M.: Active learning with statistical models. Advances in Neural Information Processing Systems 7, 705–712 (1995)Google Scholar
  4. 4.
    Cohn, D.A., Atlas, L., Ladner, R.E.: Improving generalization with active learning. Machine Learning 15(2), 201–221 (1994)Google Scholar
  5. 5.
    Davé, R.N.: Characterization and detection of noise in clustering. Pattern Recogn. Lett. 12(11), 657–664 (1991)CrossRefGoogle Scholar
  6. 6.
    Gabrys, B., Petrakieva, L.: Combining labelled and unlabelled data in the design of pattern classification systems. International Journal of Approximate Reasoning (2004)Google Scholar
  7. 7.
    Grira, N., Crucianu, M., Boujemaa, N.: Active semi-supervised clustering for image database categorization. Content-Based Multimedia Indexing (2005)Google Scholar
  8. 8.
    Hochbaum, Shmoys: A best possible heuristic for the k-center problem. Mathematics of Operations Research 10(2), 180–184 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Jantzen, J., Norup, J., Dounias, G., Bjerregaard3, B.: Pap-smear benchmark data for pattern classification (2006)Google Scholar
  10. 10.
    Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)Google Scholar
  12. 12.
    Luo, T., Kramer, K., Goldgof, D., Hall, L., Samson, S., Remsen, A., Hopkins, T.: Active learning to recognize multiple types of plankton. Journal of Machine Learning Research, 589–613 (2005)Google Scholar
  13. 13.
    Nguyen, H., Smeulders, A.: Active learning using pre-clustering. In: ICML (2004)Google Scholar
  14. 14.
    Osugi, T., Kun, D., Scott, S.: Balancing exploration and exploitation: A new algorithm for active machine learning. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 330–337 (2005)Google Scholar
  15. 15.
    Pedrycz, W., Waletzky, J.: Fuzzy clustering with partial supervision. IEEE Transactions on systems, man and cybernetics —Part B: Cybernetics 27, 177–185 (1997)Google Scholar
  16. 16.
    Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: ICMLProceedings, 17th International Conference on Machine Learning, pp. 839–846 (2000)Google Scholar
  17. 17.
    Wang, L., Chan, K.L., Zhang, Z.h.: Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 629–634 (2003)Google Scholar
  18. 18.
    Warmuth, M.K., Raetsch, G., Mathieson, M., Liao, J., Lemmen, C.: Support vector machines for active learning in the drug discovery process. Journal of Chemical Information Sciences, 667–673 (2003)Google Scholar
  19. 19.
    Windham, M.: Cluster validity for fuzzy clustering algorithms. Fuzzy Sets and Systems 5, 177–185 (1981)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nicolas Cebron
    • 1
  • Michael R. Berthold
    • 1
  1. 1.ALTANA Chair for Bioinformatics and Information Mining, Department of Computer and Information ScienceUniversity of KonstanzKonstanzGermany

Personalised recommendations