Skip to main content

Labeling Negative Examples in Supervised Learning of New Gene Regulatory Connections

  • Conference paper
Book cover Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2010)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6685))

  • 802 Accesses

Abstract

Supervised learning methods have been recently exploited to learn gene regulatory networks from gene expression data. The basic approach consists into building a binary classifier from feature vectors composed by expression levels of a set of known regulatory connections, available in public databases or known in literature. Such a classifier is then used to predict new unknown connections.

The quality of the training set plays a crucial role in such an inference scheme. In binary classification the training set should be composed of positive and negative examples, but in Biology literature the only collected information is whether two genes interact. Instead, the counterpart information is usually not reported, as Biologists are not aware to state whether two genes are not interacting.

The over presence of topology motifs in currently known gene regulatory networks, such as, feed–forward loops, bi–fan clusters, and single input modules, could drive the selection of reliable negative examples. We introduce, discuss, and evaluate a number of negative selection heuristics that exploits the known gene network topology of Escherichia coli and Saccharomyces cerevisiae.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A.: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(suppl. 1) (2006)

    Google Scholar 

  2. Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S.: Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol.

    Google Scholar 

  3. Liang, S., Fuhrman, S., Somogyi, R.: Reveal, a general reverse engineering algorithm for inference of genetic network architectures. In: Pac. Symp. Biocomput., pp. 18–29 (1998)

    Google Scholar 

  4. Polynikis, A., Hogan, S.J., di Bernardo, M.: Comparing different ODE modelling approaches for gene regulatory networks. Journal of Theoretical Biology (2009)

    Google Scholar 

  5. Werhli, A.V., Husmeier, D.: Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat. Appl. Genet. Mol. Biol. 6 (2007)

    Google Scholar 

  6. Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics  21, i38–i46

    Google Scholar 

  7. Bock, J.R., Gough, D.A.: Predicting protein protein interactions from primary structure. Bioinformatics 17, 455–460 (2001)

    Article  Google Scholar 

  8. Yamanishi, Y., Bach, F., Vert, J.P.: Glycan classification with tree kernels. Bioinformatics 23, 1211–1216 (2007)

    Article  Google Scholar 

  9. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques. Kaufmann series in data management systems. Morgan Kaufmann, San Francisco

    Google Scholar 

  10. Grzegorczyk, M., Husmeier, D., Werhli, A.V.: Reverse engineering gene regulatory networks with various machine learning methods. Analysis of Microarray Data

    Google Scholar 

  11. Mordelet, F., Vert, J.P.: SIRENE: supervised inference of regulatory networks. Bioinformatics 24, i76–i82 (2008)

    Google Scholar 

  12. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: KDD 2008: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM, New York (2008)

    Google Scholar 

  13. Cerulo, L., Elkan, C., Ceccarelli, M.: Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics (2010)

    Google Scholar 

  14. Yu, H., Han, J., chuan Chang, K.C.: Pebl: Web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering 16, 70–81 (2004)

    Article  Google Scholar 

  15. Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI 2003, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, pp. 587–594 (2003)

    Google Scholar 

  16. Ceccarelli, M., Cerulo, L.: Selection of negative examples in learning gene regulatory networks. In: IEEE International Conference on Bioinformatics and Biomedicine Workshop, BIBMW 2009, pp. 56–61 (2009)

    Google Scholar 

  17. Alon, U.: Network motifs: theory and experimental approaches. Nature Reviews Genetics 8, 450–461 (2007)

    Article  Google Scholar 

  18. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon1, U.: Network motifs: Simple building blocks of complex networks. Science 298 (2002)

    Google Scholar 

  19. Albert, I., Albert, R.: Conserved network motifs allow protein protein interaction prediction. Bioinformatics 20, 3346–3352 (2004)

    Article  Google Scholar 

  20. Itzkovitz, S., Levitt, R., Kashtan, N., Milo, R., Itzkovitz, M., Alon, U.: Coarse-graining and self-dissimilarity of complex networks. Phys. Rev. E Stat. Nonlin. Soft. Matter Phys. 71 (2005)

    Google Scholar 

  21. Kalir, S., McClure, J., Pabbaraju, K., Southward, C., Ronen, M., Leibler, S., Surette, M.G., Alon, U.: Ordering genes in a flagella pathway by analysis of expression kinetics from living bacteria. Science 292, 2080–2083 (2001)

    Article  Google Scholar 

  22. Goemann, B., Wingender, E., Potapov, A.P.: An approach to evaluate the topological significance of motifs and other patterns in regulatory networks. BMC System Biology 3 (2009)

    Google Scholar 

  23. Shen-Orr, S.S., Milo, R., Mangan, S., Alon, U.: Network motifs in the transcriptional regulation network of escherichia coli. Nature Genetics 31, 64–68 (2002)

    Article  Google Scholar 

  24. Lin, H.T., Lin, C.J., Weng, R.C.: A note on platt’s probabilistic outputs for support vector machines. Mach. Learn. 68, 267–276 (2007)

    Article  Google Scholar 

  25. Marbach, D., Schaffter, T., Mattiussi, C., Floreano, D.: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 16, 229–239 (2009)

    Article  Google Scholar 

  26. Stolovitzky, G., Monroe, D., Califano, A.: Dialogue on reverse-engineering assessment and methods: The dream of high-throughput pathway inference. Annals of the New York Academy of Sciences 1115, 1–22 (2007)

    Article  Google Scholar 

  27. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at , http://www.csie.ntu.edu.tw/~cjlin/libsvm

  28. Minami, R., Kitazawa, R., Maeda, S., Kitazawa, S.: Analysis of 5’-flanking region of human smad4 (DPC4) gene. Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression 1443, 182–185 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cerulo, L., Paduano, V., Zoppoli, P., Ceccarelli, M. (2011). Labeling Negative Examples in Supervised Learning of New Gene Regulatory Connections. In: Rizzo, R., Lisboa, P.J.G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2010. Lecture Notes in Computer Science(), vol 6685. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21946-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21946-7_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21945-0

  • Online ISBN: 978-3-642-21946-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics