Unsupervised Multiple-Instance Learning for Functional Profiling of Genomic Data

  • Corneliu Henegar
  • Karine Clément
  • Jean-Daniel Zucker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


Multiple-instance learning (MIL) is a popular concept among the AI community to support supervised learning applications in situations where only incomplete knowledge is available. We propose an original reformulation of the MIL concept for the unsupervised context (UMIL), which can serve as a broader framework for clustering data objects adequately described by the multiple-instance representation. Three algorithmic solutions are suggested by derivation from available conventional methods: agglomerative or partition clustering and MIL’s citation-kNN approach. Based on standard clustering quality measures, we evaluated these algorithms within a bioinformatic framework to perform a functional profiling of two genomic data sets, after relating expression data to biological annotations into an UMIL representation. Our analysis spotlighted meaningful interaction patterns relating biological processes and regulatory pathways into coherent functional modules, uncovering profound features of the biological model. These results indicate UMIL’s usefulness in exploring hidden behavioral patterns from complex data.


White Adipose Tissue Annotate Transcript Silhouette Index Obese Human Subject KEGG Category 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1-2), 31–71 (1997)zbMATHCrossRefGoogle Scholar
  2. 2.
    Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: NIPS (1997)Google Scholar
  3. 3.
    Wang, J., Zucker, J.D.: Solving the multiple-instance problem: A lazy learning approach. In: ICML, pp. 1119–1126 (2000)Google Scholar
  4. 4.
    Chevaleyre, Y., Zucker, J.D.: Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. Application to the mutagenesis problem. In: Canadian Conference on AI, pp. 204–214 (2001)Google Scholar
  5. 5.
    Zhang, Q., Goldman, S.A.: Em-dd: An improved multiple-instance learning technique. In: NIPS, pp. 1073–1080 (2001)Google Scholar
  6. 6.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS, pp. 561–568 (2002)Google Scholar
  7. 7.
    Goldman, S.A., Scott, S.D.: Multiple-instance learning of real-valued geometric patterns. Ann. Math. Artif. Intell. 39(3), 259–290 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Tao, Q., Scott, S., Vinodchandran, N.V., Osugi, T.T.: SVM-based generalized multiple-instance learning via approximate box counting. In: ICML (2004)Google Scholar
  9. 9.
    Tao, Q., Scott, S.D.: A faster algorithm for generalized multiple-instance learning. In: FLAIRS Conference (2004)Google Scholar
  10. 10.
    Ray, S., Craven, M.: Supervised versus multiple instance learning: an empirical comparison. In: ICML 2005 Conference (2005)Google Scholar
  11. 11.
    Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.: Content-based image retrieval using multiple-instance learning. In: ICML, pp. 682–689 (2002)Google Scholar
  12. 12.
    Zhou, Z.H., Jiang, K., Li, M.: Multi-instance learning based web mining. Appl. Intell. 22(2), 135–147 (2005)CrossRefGoogle Scholar
  13. 13.
    Brown, J., Zhang, J., Scott, S.: On generalized multiple-instance learning. Technical report, University of Nebraska (2003)Google Scholar
  14. 14.
    Yang, J.: Review of multi-instance learning and its applications. Technical report, School of Computer Science Carnegie Mellon University (2005)Google Scholar
  15. 15.
    Dooly, D.R., Zhang, Q., Goldman, S.A., Amar, R.A.: Multiple-instance learning of real-valued data. Journal of Machine Learning Research 3, 651–678 (2002)CrossRefGoogle Scholar
  16. 16.
    Butte, A., Kohane, I.: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In: Pac. Symp. Biocomput., pp. 418–429 (2000)Google Scholar
  17. 17.
    Zhou, X., Wang, X., Dougherty, E., Russ, D., Suh, E.: Gene clustering based on clusterwide mutual information. J. Comput. Biol. 11(1), 147–161 (2004)CrossRefGoogle Scholar
  18. 18.
    Murtagh, F.: Multidimensional clustering algorithms. In: Physica-Verlag, V. (ed.) COMPSTAT Lectures 4 (1985)Google Scholar
  19. 19.
    Kaufman, L., Rousseuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons Inc., Chichester (1990)Google Scholar
  20. 20.
    Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA (2002)Google Scholar
  21. 21.
    Azuaje, F., Bolshakova, N.: Cluster validation techniques for genome expression data. Signal Processing 83(4), 825–833 (2003)zbMATHCrossRefGoogle Scholar
  22. 22.
    Cancello, R., Henegar, C., Viguerie, N., Taleb, S., Poitou, C., Rouault, C., Coupaye, M., Pelloux, V., Hugol, D., Bouillot, J., Bouloumie, A., Barbatelli, G., Cinti, S., Svensson, P., Barsh, G., Zucker, J., Basdevant, A., Langin, D., Clement, K.: Reduction of macrophage infiltration and chemoattractant gene expression changes in white adipose tissue of morbidly obese subjects after surgery-induced weight loss. Diabetes 54(8), 2277–2286 (2005)CrossRefGoogle Scholar
  23. 23.
    Feve, B.: Adipogenesis: cellular and molecular aspects. Best Pract. Res. Clin. Endocrinol Metab. 19(4), 483–499 (2005)CrossRefGoogle Scholar
  24. 24.
    Pedersen, T., Kowenz-Leutz, E., Leutz, A., Nerlov, C.: Cooperation between C/EBPalpha TBP/TFIIB and SWI/SNF recruiting domains is required for adipocyte differentiation. Genes. Dev. 15(23), 3208–3216 (2001)CrossRefGoogle Scholar
  25. 25.
    Charriere, G., Cousin, B., Arnaud, E., Andre, M., Bacou, F., Penicaud, L., Casteilla, L.: Preadipocyte conversion to macrophage. Evidence of plasticity. J. Biol. Chem. 278(11), 9850–9855 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Corneliu Henegar
    • 1
  • Karine Clément
    • 1
    • 2
    • 3
  • Jean-Daniel Zucker
    • 1
    • 4
  1. 1.INSERM, UMR U-755 Nutriomique, Hôtel-DieuParisFrance
  2. 2.Faculté de Médecine Les CordeliersUniversité Paris VIParisFrance
  3. 3.AP-HP, Pitié-Salpêtrière, Service de NutritionParisFrance
  4. 4.LIM&BIO EA3969Université Paris NordBobignyFrance

Personalised recommendations