Using Medians to Generate Consensus Rankings for Biological Data

  • Sarah Cohen-Boulakia
  • Alain Denise
  • Sylvie Hamel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6809)

Abstract

Faced with the deluge of data available in biological databases, it becomes increasingly difficult for scientists to obtain reasonable sets of answers to their biological queries. A critical example appears in medicine, where physicians frequently need to get information about genes associated with a given disease. When they pose such queries to Web portals (e.g., Entrez NCBI) they usually get huge amounts of answers which are not ranked, making them very difficult to be exploited. In the last years, while several ranking approaches have been proposed, none of them is considered as the most promising.

Instead of considering ranking methods as alternative approaches, we propose to generate a consensus ranking to highlight the common points of a set of rankings while minimizing their disagreements. Our work is based on the concept of median, originally defined on permutations: Given m permutations and a distance function, the median problem is to find a permutation that is the closest of the m given permutations. We have investigated the problem of computing a median of a set of m rankings considering different elements and ties, under a generalized Kendall-τ distance. This problem is known to be NP-hard. In this paper, we present a new heuristic for the problem and we demonstrate the benefit of our approach on real queries using four different ranking methods.

Availability: http://bioguide-project.net/bioconsert

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ailon, N.: Aggregation of Partial Rankings, p-Ratings and Top-m Lists. Algorithmica 57(2), 284–300 (2010)CrossRefMATHGoogle Scholar
  2. 2.
    Ailon, N., Charikar, M., Newman, N.: Aggregating inconsistent information: Ranking and clustering. In: Proceedings of the 37th STOC, pp. 684–693 (2005)Google Scholar
  3. 3.
    Betzler, N., Fellows, M.R., Guo, J., Niedermeier, R., Rosamond, F.A.: Fixed-Parameter Algorithms for Kemeny Scores. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034, pp. 60–71. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Brancotte, B., Biton, A., Bernard-Pierrot, I., Radvanyi, F., Reyal, F., Cohen-Boulakia, S.: Gene List significance at-a-glance with GeneValorization. To Appear in Bioinformatics (Application Note) (February 2011)Google Scholar
  5. 5.
    Birkland, A., Yona, G.: Biozon: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics 7, 70 (2006)CrossRefGoogle Scholar
  6. 6.
    Blin, G., Crochemore, M., Hamel, S., Vialette, S.: Medians of an odd number of permutations. To Appear in Pure Mathematics and Applications (2011)Google Scholar
  7. 7.
    Cohen-Boulakia, S., Biton, O., Davidson, S., Froidevaux, C.: BioGuideSRS: Querying Multiple Sources with a user-centric perspective. Bioinformatics 23(10), 1301–1303 (2006)CrossRefGoogle Scholar
  8. 8.
    Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank Aggregation Methods for the Web. In: Proceedings of the 10th WWW, pp. 613–622 (2001)Google Scholar
  9. 9.
    Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., Vee, E.: Comparing and Aggregating Rankings with Ties. In: Proceedings of PODS 2004, pp. 47–55 (2004)Google Scholar
  10. 10.
    Hussels, P., Trissl, S., Leser, U.: What’s new? What’s certain? – scoring search results in the presence of overlapping data sources. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 231–246. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Kendall, M.: A New Measure of Rank Correlation. Biometrika 30, 81–89 (1938)CrossRefMATHGoogle Scholar
  12. 12.
    Kenyon-Mathieu, C., Schudy, W.: How to rank with few errors. In: Proceedings of the 39th STOC, pp. 95–103 (2007)Google Scholar
  13. 13.
    Lacroix, Z., Raschid, L., Vidal, M.E.: Efficient techniques to explore and rank paths in life science data sources. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 187–202. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Laignel, N.: Ranking biological data taking into account user’s preferences, Master thesis report (co-supervised by S. Cohen-Boulakia, C. Froidevaux, and U. Leser), University of Paris-Sud XI (2010)Google Scholar
  15. 15.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Stanford University, Stanford (1998)Google Scholar
  16. 16.
    Shafer, P., Isganitis, T., Yona, G.: Hubs of knowledge: using the functional link structure in Biozon to mine for biologically significant entities. BMC Bioinformatics 7, 71 (2006)CrossRefGoogle Scholar
  17. 17.
    Varadarajan, R., Hritidis, V., Raschid, L., Vidal, M., Ibanez, L., Rodriguez-Drumond, H.: Flexible and Efficient Querying and Ranking on Hyperlinked Data Sources. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 553–564 (2009)Google Scholar
  18. 18.
    Wilf, H.S.: Generatingfunctionology, p. 147. Academic Press, NY (1990)MATHGoogle Scholar
  19. 19.
    van Zuylen, A., Williamson, D.P.: Deterministic algorithms for rank aggregation and other ranking and clustering problems. In: Kaklamanis, C., Skutella, M. (eds.) WAOA 2007. LNCS, vol. 4927, pp. 260–273. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Sarah Cohen-Boulakia
    • 1
    • 2
  • Alain Denise
    • 1
    • 2
    • 3
  • Sylvie Hamel
    • 4
  1. 1.LRI (Laboratoire de Recherche en Informatique), CNRS UMR 8623Université Paris-SudFrance
  2. 2.AMIB GroupINRIA Saclay Ile-de-FranceFrance
  3. 3.IGM (Institut de Génétique et de Microbiologie), CNRS UMR 8621Université Paris-SudFrance
  4. 4.DIRO (Département d’Informatique et de Recherche Opérationnelle)Université de MontréalCanada

Personalised recommendations