Extended Spearman and Kendall Coefficients for Gene Annotation List Correlation

  • Davide Chicco
  • Eleonora Ciceri
  • Marco Masseroli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8623)


Gene annotations are a key concept in bioinformatics and computational methods able to predict them are a fundamental contribution to the field. Several machine learning algorithms are available in this domain; they include relevant parameters that might influence the output list of predicted gene annotations. The amount that the variation of these key parameters affect the output gene annotation lists remains an open aspect to be evaluated. Here, we provide support for such evaluation by introducing two list correlation measures; they are based on and extend the Spearman ρ correlation coefficient and Kendall τ distance, respectively. The application of these measures to some gene annotation lists, predicted from Gene Ontology annotation datasets of different organisms’ genes, showed interesting patterns between the predicted lists. Additionally, they allowed expressing some useful considerations about the prediction parameters and algorithms used.


Biomolecular annotations Spearman coefficient Kendall distance top-K queries 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Karp, P.D.: What we do not know about sequence analysis and sequence databases. Bioinformatics 14(9), 753–754 (1998)CrossRefGoogle Scholar
  2. 2.
    Pandey, G., Kumar, V., Steinbach, M.: Computational approaches for protein function prediction: A survey. Twin Cities: Department of Computer Science and Engineering, University of Minnesota (2006)Google Scholar
  3. 3.
    Khatri, P., Done, B., Rao, A., Done, A., Draghici, S.: A semantic analysis of the annotations of the human genome. Bioinformatics 21(16), 3416–3421 (2005)CrossRefGoogle Scholar
  4. 4.
    Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Consortium, G.O., et al.: Creating the gene ontology resource: design and implementation. Genome Research 11(8), 1425–1433 (2001)CrossRefGoogle Scholar
  6. 6.
    Chicco, D., Masseroli, M.: A discrete optimization approach for svd best truncation choice based on roc curves. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)Google Scholar
  7. 7.
    Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Machine Learning 56(1-3), 9–33 (2004)CrossRefzbMATHGoogle Scholar
  8. 8.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)Google Scholar
  9. 9.
    Chicco, D., Tagliasacchi, M., Masseroli, M.: Genomic annotation prediction based on integrated information. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds.) CIBB 2011. LNCS, vol. 7548, pp. 238–252. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Done, B., Khatri, P., Done, A., Draghici, S.: Semantic analysis of genome annotations using weighting schemes. In: IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007, pp. 212–218. IET (2007)Google Scholar
  11. 11.
    Done, B., Khatri, P., Done, A., Draghici, S.: Predicting novel human gene ontology annotations using semantic analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 7(1), 91–99 (2010)CrossRefGoogle Scholar
  12. 12.
    Pinoli, P., Chicco, D., Masseroli, M.: Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)Google Scholar
  13. 13.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)Google Scholar
  14. 14.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. the Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  15. 15.
    Masseroli, M., Chicco, D., Pinoli, P.: Probabilistic latent semantic analysis for prediction of gene ontology annotations. In: The 2012 International Joint Conference on eural Networks (IJCNN), pp. 1–8. IEEE (2012)Google Scholar
  16. 16.
    Pinoli, P., Chicco, D., Masseroli, M.: Latent dirichlet allocation based on gibbs sampling for gene function prediction. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014)Google Scholar
  17. 17.
    Chicco, D., Sadowski, P., Baldi, P.: Deep autoencoder neural networks for gene ontology annotation predictions. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 533–540. ACM (2014)Google Scholar
  18. 18.
    Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications*. Journal of the American Statistical Association 49(268), 732–764 (1954)zbMATHGoogle Scholar
  19. 19.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM Journal on Discrete Mathematics 17(1), 134–160 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Spearman, C.: The proof and measurement of association between two things. The American Journal of Psychology 15(1), 72–101 (1904)CrossRefGoogle Scholar
  21. 21.
    Kendall, M.G.: A new measure of rank correlation. Biometrika, 81–93 (1938)Google Scholar
  22. 22.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR) 40(4), 11 (2008)CrossRefGoogle Scholar
  23. 23.
    Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th International Conference on World Wide Web, pp. 571–580. ACM (2010)Google Scholar
  24. 24.
    Bertin-Mahieux, T., Eck, D., Maillet, F., Lamere, P.: Autotagger: A model for predicting social tags from acoustic features on large music databases. Journal of New Music Research 37(2), 115–135 (2008)CrossRefGoogle Scholar
  25. 25.
    Chen, Q., Aickelin, U.: Movie recommendation systems using an artificial immune system. arXiv preprint arXiv:0801.4287 (2008)Google Scholar
  26. 26.
    Payne, J.S., Stonbam, T.J.: Can texture and image content retrieval methods match human perception?. In: Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 154–157. IEEE (2001)Google Scholar
  27. 27.
    Ciceri, E., Fraternali, P., Martinenghi, D., Tagliasacchi, M.: Crowdsourcing for Top-K Query Processing over Uncertain Data. IEEE Transactions on Knowledge and Data Engineering (TKDE), 1–14 (preprint) (2015)Google Scholar
  28. 28.
    Fawcett, T.: Roc graphs: Notes and practical considerations for researchers. Machine Learning 31, 1–38 (2004)MathSciNetGoogle Scholar
  29. 29.
    Canakoglu, A., Masseroli, M., Ceri, S., Tettamanti, L., Ghisalberti, G., Campi, A.: Integrative warehousing of biomolecular information to support complex multi-topic queries for biomedical knowledge discovery. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Davide Chicco
    • 1
    • 2
  • Eleonora Ciceri
    • 1
  • Marco Masseroli
    • 1
  1. 1.Dipartimento di Elettronica Informazione e BioingegneriaPolitecnico di MilanoMilanItaly
  2. 2.Princess Margaret Cancer CentreUniversity of TorontoTorontoCanada

Personalised recommendations