Abstract
We show a logical aggregation method that, combined with propositionalization methods, can construct novel structured biological features from gene expression data. We do this to gain understanding of pathway mechanisms, for instance, those associated with a particular disease. We illustrate this method on the task of distinguishing between two types of lung cancer; Squamous Cell Carcinoma (SCC) and Adenocarcinoma (AC). We identify pathway activation patterns in pathways previously implicated in the development of cancers. Our method identified a model with comparable predictive performance to the winning algorithm of a recent challenge, while providing biologically relevant explanations that may be useful to a biologist.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Croft, D., Mundo, A.F., Haw, R., Milacic, M., Weiser, J., Guanming, W., Caudy, M., Garapati, P., Gillespie, M., Kamdar, M.R., et al.: The Reactome pathway knowledgebase. Nucleic Acids Res. 42(D1), D472–D477 (2014)
Rhrissorrakrai, K., Jeremy Rice, J., Boue, S., Talikka, M., Bilal, E., Martin, F., Meyer, P., Norel, R., Xiang, Y., Stolovitzky, G., Hoeng, J., Peitsch, M.C.: SBV improver diagnostic signature challenge: design and results. Syst. Biomed. 1(4), 3–14 (2013)
Tarca, A.L., Than, N.G., Romero, R.: Methodological approach from the best overall team in the SBV Improver Diagnostic Signature Challenge. Syst. Biomed. 1(4), 217–227 (2013)
Draghici, S.: Statistical intelligence: effective analysis of high-density microarray data. Drug Discov. Today 7(11), S55–S63 (2002)
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Nat. Acad. Sci. 102(43), 15545–15550 (2005)
Gamberger, D., Lavrač, N., Železnỳ, F., Tolar, J.: Induction of comprehensible models for gene expression datasets by subgroup discovery methodology. J. Biomed. Inf. 37(4), 269–284 (2004)
Holec, M., Klma, J., Železnỳ, F., Tolar, J.: Comparative evaluation of set-level techniques in predictive classification of gene expression samples. BMC Bioinform. 13(Suppl 10), S15 (2012)
Whelan, K., Ray, O., King, R.D.: Representation, simulation, and hypothesis generation in graph and logical models of biological networks. In: Castrillo, J.I., Oliver, S.G. (eds.) Yeast Systems Biology, pp. 465–482. Springer, New York (2011)
Danon, L., Diaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. J. Stat. Mech. Theory Exp. 2005(09), P09008 (2005)
Kim, W., Li, M., Wang, J., Pan, Y.: Biological network motif detection and evaluation. BMC Syst. Biol. 5(Suppl 3), S5 (2011)
Holec, M., Železnỳ, F., Kléma, J., Svoboda, J., Tolar, J.: Using bio-pathways in relational learning. Inductive Logic Programming, p. 50 (2008)
De Raedt, L.: Logical and Relational Learning. Springer Science & Business Media, New York (2008)
Flach, P.A., Lachiche, N.: 1BC: a first-order bayesian classifier. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 92–103. Springer, Heidelberg (1999)
Lavrač, N., Vavpetič, A.: Relational and semantic data mining. In: Calimeri, F., Ianni, G., Truszczynski, M. (eds.) LPNMR 2015. LNCS, vol. 9345, pp. 20–31. Springer, Heidelberg (2015)
Dehaspe, L., De Raedt, L.: Mining association rules in multiple relations. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 125–132. Springer, Heidelberg (1997)
Ahmed, C.F., Lachiche, N., Charnay, C., Jelali, S.E., Braud, A.: Flexible propositionalization of continuous attributes in relational data mining. Expert Systems with Applications (2015)
Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Mach. Learn. 62(1–2), 65–105 (2006)
França, M.V.M., Zaverucha, G., d’Avila Garcez, A.S.: Fast relational learning using bottom clause propositionalization with artificial neural networks. Mach. Learn. 94(1), 81–104 (2014)
Ristoski, P., Paulheim, H.: A comparison of propositionalization strategies for creating featuresfrom linked open data. In: Linked Data for Knowledge Discovery, p. 6 (2014)
Ristoski, P.: Towards linked open data enabled data mining. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 772–782. Springer, Heidelberg (2015)
Kuželka, O., Železnỳ, F.: Block-wise construction of tree-like relational features with monotone reducibility and redundancy. Mach. Learn. 83(2), 163–192 (2010)
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridizationarray data repository. Nucleic Acids Res. 30(1), 207–210 (2002)
Wang, R.-S., Saadatpour, A., Albert, R.: Boolean modeling in systems biology: an overview of methodology and applications. Phy. Biol. 9(5), 055001 (2012)
McCall, M.N., Jaffee, H.A., Zelisko, S.J., Sinha, N., Hooiveld, G., Irizarry, R.A., Zilliox, M.J.: The Gene Expression Barcode 3.0: improved data processing and mining tools. Nucleic Acids Res. 42(D1), D938–D943 (2014)
Tyson, J.J., Chen, K.C., Novak, B.: Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell. Curr. Opin. Cell Biol. 15(2), 221–231 (2003)
Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., Müller, M.: pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 12(1), 77 (2011)
Rongrong, W., Galan-Acosta, L., Norberg, E.: Glucose metabolism provide distinct prosurvival benefits to non-small cell lung carcinomas. Biochem. Biophy. Res. Commun. 460(3), 572–577 (2015)
Acknowledgments
LACM received funding from the Medical Research Council (MC_UU_12013/8).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Neaves, S.R., Millard, L.A.C., Tsoka, S. (2016). Using ILP to Identify Pathway Activation Patterns in Systems Biology. In: Inoue, K., Ohwada, H., Yamamoto, A. (eds) Inductive Logic Programming. ILP 2015. Lecture Notes in Computer Science(), vol 9575. Springer, Cham. https://doi.org/10.1007/978-3-319-40566-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-40566-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40565-0
Online ISBN: 978-3-319-40566-7
eBook Packages: Computer ScienceComputer Science (R0)