Characterising the Influence of Rule-Based Knowledge Representations in Biological Knowledge Extraction from Transcriptomics Data

  • Simon Baron
  • Nicola Lazzarini
  • Jaume Bacardit
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10199)


Currently, there is a wealth of biotechnologies (e.g. sequencing, proteomics, lipidomics) able to generate a broad range of data types out of biological samples. However, the knowledge gained from such data sources is constrained by the limitations of the analytics techniques. The state-of-the-art machine learning algorithms are able to capture complex patterns with high prediction capacity. However, often it is very difficult if not impossible to extract human-understandable knowledge out of these patterns. In recent years evolutionary machine learning techniques have shown that they are competent methods for biological/biomedical data analytics. They are able to generate interpretable prediction models and, beyond just prediction models, they are able to extract useful knowledge in the form of biomarkers or biological networks.

The focus of this paper is to thoroughly characterise the impact that a core component of the evolutionary machine learning process, its knowledge representations, has in the process of extracting biologically-useful knowledge out of transcriptomics datasets. Using the FuNeL evolutionary machine learning-based network inference method, we evaluate several variants of rule knowledge representations on a range of transcriptomics datasets to quantify the volume and complementarity of the knowledge that each of them can extract. Overall we show that knowledge representations, often considered a minor detail, greatly impact on the downstream biological knowledge extraction process.


Evolutionary machine learning Rule knowledge representations Biological knowledge extraction 



This work was supported by the Engineering and Physical Sciences Research Council [EP/N031962/1]. We are grateful to the School of Computing Science of Newcastle University for the access to its High Performance Computing Cluster. We thank the anonymous reviewers for the valuable feedback received.


  1. 1.
    Bacardit, J., Burke, E.K., Krasnogor, N.: Improving the scalability of rule-based evolutionary learning. Memet. Comput. 1, 55–67 (2009)CrossRefGoogle Scholar
  2. 2.
    Bacardit, J., Garrell, J.M.: Bloat control and generalization pressure using the minimum description length principle for a Pittsburgh approach learning classifier system. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2003–2005. LNCS (LNAI), vol. 4399, pp. 59–79. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-71231-2_5CrossRefGoogle Scholar
  3. 3.
    Bacardit, J., Krasnogor, N.: Empirical evaluation of ensemble techniques for a Pittsburgh learning classifier system. In: Bacardit, J., Bernadó-Mansilla, E., Butz, M.V., Kovacs, T., Llorà, X., Takadama, K. (eds.) IWLCS 2006–2007. LNCS (LNAI), vol. 4998, pp. 255–268. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88138-4_15CrossRefGoogle Scholar
  4. 4.
    Bacardit, J., Stout, M., Hirst, J.D., Valencia, A., Smith, R.E., Krasnogor, N.: Automated alphabet reduction for protein datasets. BMC Bioinform. 10, 6 (2009)CrossRefGoogle Scholar
  5. 5.
    Bacardit, J., Widera, P., Márquez-Chamorro, A., Divina, F., Aguilar-Ruiz, J.S., Krasnogor, N.: Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features. Bioinformatics 28(19), 2441–2448 (2012)CrossRefGoogle Scholar
  6. 6.
    Bassel, G.W., Glaab, E., Marquez, J., Holdsworth, M.J., Bacardit, J.: Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets. Plant Cell 23(9), 3101–3116 (2011)CrossRefGoogle Scholar
  7. 7.
    Beer, D.G., Kardia, S.L.R., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M.G., Iannettoni, M.D., Orringer, M.B., Hanash, S.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816–824 (2002)CrossRefGoogle Scholar
  8. 8.
    Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. J. Mol. Diagn.: JMD 8(1), 31–39 (2006)CrossRefGoogle Scholar
  9. 9.
    Fainberg, H.P., Bodley, K., Bacardit, J., Li, D., Wessely, F., Mongan, N.P., Symonds, M.E., Clarke, L., Mostyn, A.: Reduced neonatal mortality in Meishan piglets: a role for hepatic fatty acids? PLoS One 7(11), 1–9 (2012)CrossRefGoogle Scholar
  10. 10.
    Glaab, E., Bacardit, J., Garibaldi, J.M., Krasnogor, N.: Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One 7(7), e39932 (2012)CrossRefGoogle Scholar
  11. 11.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  12. 12.
    Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17), 4963–4967 (2002)Google Scholar
  13. 13.
    Hemberg, E., Veeramachaneni, K., Dernoncourt, F., Wagy, M., O’Reilly, U.M.: Efficient training set use for blood pressure prediction in a large scale learning classifier system. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO 2013 Companion, pp. 1267–1274. ACM, New York (2013)Google Scholar
  14. 14.
    Lazzarini, N., Widera, P., Williamson, S., Heer, R., Krasnogor, N., Bacardit, J.: Functional networks inference from rule-based machine learning models. BioData Min. 9(1), 28 (2016)CrossRefGoogle Scholar
  15. 15.
    Marcozzi, M., Divina, F., Aguilar-Ruiz, J.S., Vanhoof, W.: A novel probabilistic encoding for EAs applied to biclustering of microarray data. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO 2011, pp. 339–346. ACM, New York (2011)Google Scholar
  16. 16.
    Martinez-Ballesteros, M., Nepomuceno-Chamorro, I.A., Riquelme, J.C.: Discovering gene association networks by multi-objective evolutionary quantitative association rules. J. Comput. Syst. Sci. 80, 118–136 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Mi, H., Poudel, S., Muruganujan, A., Casagrande, J.T., Thomas, P.D.: Panther version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 44(D1), D336–D342 (2016)CrossRefGoogle Scholar
  18. 18.
    Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)CrossRefGoogle Scholar
  19. 19.
    Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)CrossRefzbMATHGoogle Scholar
  20. 20.
    Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C., Golub, T.R.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8(1), 68–74 (2002)CrossRefGoogle Scholar
  21. 21.
    Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)CrossRefGoogle Scholar
  22. 22.
    Swan, A.L., Stekel, D.J., Hodgman, C., Allaway, D., Alqahtani, M.H., Mobasheri, A., Bacardit, J.: A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genom. 16(1), S2 (2015)CrossRefGoogle Scholar
  23. 23.
    Urbanowicz, R.J., Granizo-Mackenzie, A., Moore, J.H.: An analysis pipeline with statistical and visualization-guided knowledge discovery for Michigan-style learning classifier systems. IEEE Comp. Int. Mag. 7(4), 35–45 (2012)CrossRefGoogle Scholar
  24. 24.
    Urbanowicz, R.J., Andrew, A.S., Karagas, M.R., Moore, J.H.: Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J. Am. Med. Inform. Assoc. 20(4), 603612 (2013)CrossRefGoogle Scholar
  25. 25.
    Venturini, G.: SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 280–296. Springer, Heidelberg (1993). doi: 10.1007/3-540-56602-3_142CrossRefGoogle Scholar
  26. 26.
    Yagi, T., Morimoto, A., Eguchi, M., Hibi, S., Sako, M., Ishii, E., Mizutani, S., Imashuku, S., Ohki, M., Ichikawa, H.: Identification of a gene expression signature associated with pediatric AML prognosis. Blood 102(5), 1849–1856 (2003)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Computing ScienceNewcastle UniversityNewcastle-upon-tyneUK

Personalised recommendations