Handling Unlabeled Data in Gene Regulatory Network

  • Sasmita Rout
  • Tripti Swarnkar
  • Saswati Mahapatra
  • Debabrata Senapati
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 199)


A gene is treated as a unit of heredity in a living organism. It resides on a stretch of DNA. Gene Regulatory Network (GRN) is a network of transcription dependency among genes of an organism. A GRN can be inferred from microarray data either by unsupervised or by supervised approach. It has been observed that supervised methods yields more accurate result as compared to unsupervised methods. Supervised methods require both positive and negative data for training. In Biological literature only positive example is available as Biologist are unable to state whether two genes are not interacting. A common adopted solution is to consider a random subset of unlabeled example as negative. Random selection may degrade the performance of the classifier. It is usually expected that, when labeled data are limited, the learning performance can be improved by exploiting unlabeled data. In this paper we propose a novel approach to filter out reliable and strong negative data from unlabeled data, so that a supervised model can be trained properly. We tested this method for predicting regulation in E. Coli and observed better result as compared to other unsupervised and supervised methods. This method is based on the principle of dividing the whole domain into gene clusters and then finds the best informative cluster for further classification.


Gene Gene Regulatory Network Unlabeled data SVM K Means Cluster Transcription Factor 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Davidson, E., Levine, M.: Gene Regulatory Network. PNAS 102(14), 4935 (2005)CrossRefGoogle Scholar
  2. 2.
    Hecker, M., Lambeck, S., Toepfer, S., van Someren, E., Guthke, R.: Gene regulatory network inference: Data integration in dynamic models-A review. Bio Systems (2008)Google Scholar
  3. 3.
    Zoppoli, P., Morganella, S., Ceccarelli, M.: TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinformatics (2010)Google Scholar
  4. 4.
    Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A.: ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics (2006)Google Scholar
  5. 5.
    Liang, S., Fuhrman, S., Somogyi, R.: Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures. In: Pac. Symp. Biocomput., pp. 18–29 (1998)Google Scholar
  6. 6.
    de Jong, H.: Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol. (2002)Google Scholar
  7. 7.
    Werhli, A.V., Husmeier, D.: Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat. Appl. Genet. Mol. Biol. (2007)Google Scholar
  8. 8.
    Wang, C., Ding, C., Meraz, R.F., Holbrook, S.R.: PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics, 2590–2596 (2006)Google Scholar
  9. 9.
    Ceccarelli, M., Cerulo, L.: Selection of negative examples in learning gene regulatory networks. In: IEEE International Conference on Bioinformatics and Biomedicine Workshop, BIBMW 2009, pp. 56–61 (2009)Google Scholar
  10. 10.
    Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 213–220. ACM, New York (2008)Google Scholar
  11. 11.
    Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platt’s probabilistic outputs for support vector machines. Mach. Learn., 267–276 (2007)Google Scholar
  12. 12.
    Mordelet, F., Vert, J.P.: SIRENE: supervised inference of regulatory networks. Bioinformatics, 76–82 (2008)Google Scholar
  13. 13.
    Li, X., Liu, B.: Learning to Classify Texts Using Positive and Unlabeled Data. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, IJCAI 2003, Acapulco, Mexico, August 9-15, pp. 587–594 (2003)Google Scholar
  14. 14.
    Faith, J.J., et al.: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. (2007)Google Scholar
  15. 15.
    Salgado, H., et al.: Regulondb (version 5.0): Escherichia coli k-12 transcriptional regulatory network, operon organization, and rowth conditions. Nucleic Acids Res. 34(Database issue), D394–D397 (2006)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sasmita Rout
    • 1
  • Tripti Swarnkar
    • 1
  • Saswati Mahapatra
    • 1
  • Debabrata Senapati
    • 1
  1. 1.Department of Computer Applications, ITERSOA UniversityBhubaneswarIndia

Personalised recommendations