Skip to main content

Extensions of Naive Bayes and Their Applications to Bioinformatics

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4463))

Included in the following conference series:

  • 893 Accesses

Abstract

In this paper we will study the naïve Bayes, one of the popular machine learning algorithms, and improve its accuracy without seriously affecting its computational efficiency. Naïve Bayes assumes positional independence, which makes the computation of the joint probability value easier at the expense of the accuracy or the underlying reality. In addition, the prior probabilities of positive and negative instances are computed from the training instances, which often do not accurately reflect the real prior probabilities. In this paper we address these two issues. We have developed algorithms that automatically perturb the computed prior probabilities and search around the neighborhood to maximize a given objective function. To improve the prediction accuracy we introduce limited dependency on the underlying pattern. We have demonstrated the importance of these extensions by applying them to solve the problem in discriminating a TATA box from putative TATA boxes found in promoter regions of plant genome. The best prediction accuracy of a naïve Bayes with 10 fold cross validation was 69% while the second extension gave the prediction accuracy of 79% which is better than the best solution from an artificial neural network prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Cao, J., et al.: A naive Bayes model to predict coupling between seven transmembrane domain receptors and G-proteins. Bioinformatics 19(2), 234–240 (2003)

    Article  Google Scholar 

  2. Ferrari, L.D., Aitken, S.: Mining housekeeping genes with a Naive Bayes classifier. BMC Genomics 7(277) (2006)

    Google Scholar 

  3. Sandberg, R., et al.: Capturing Whole-Genome Characteristics in Short Sequences Using a Naïve Bayesian Classifier. Genome Research 11(8), 1404–1409 (2001)

    Article  Google Scholar 

  4. Wu, J., Mellor, J.C., DeLisi, C.: Deciphering protein network organization using phylogenetic profile groups. Genome Inform. 16(1), 142–149 (2005)

    Google Scholar 

  5. Weka 3: Data Mining Software in Java (2006), http://www.cs.waikato.ac.nz/ml/weka/

  6. Jaynes, E.T.: Prior Probabilities. IEEE Trans. on Systems Science and Cybernetics 4(3), 227–241 (1968)

    Article  Google Scholar 

  7. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  8. Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)

    Article  Google Scholar 

  9. Butler, J.E.F., Kadonaga, J.T.: The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Development 16, 2583–2592 (2002)

    Article  Google Scholar 

  10. Loganantharaj, R., Karim, M.E., Lakhotia, A.: Recognizing TATA promoters based on discriminating frequency analysis of neighborhood tuples. In: Biot-04: First Biotechnology and Bioinformatics Symposium: A Community and Academic Forum, Colorado Springs, Colorado (September 2004)

    Google Scholar 

  11. Loganantharaj, R.: Discriminating TATA-box from putative TATA box in plant genome. International Journal of Bioinformatics Research and Applications 2(1), 36–51 (2006)

    Google Scholar 

  12. Shahmuradov, I.A., et al.: PlantProm: a database of plant promoter sequences. Nucleic Acids Res. 31(1), 114–117 (2003)

    Article  Google Scholar 

  13. Logananatharaj, R.: Comparing the Performance of Several Popular Machine Learning Algorithms. In: Biotechnology and Bioinformatics Symposium, Provo, Utah, October, 20-21 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ion Măndoiu Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Loganantharaj, R. (2007). Extensions of Naive Bayes and Their Applications to Bioinformatics. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72031-7_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72030-0

  • Online ISBN: 978-3-540-72031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics