Advertisement

Discovering Essential Domains in Essential Genes

  • Yulan Lu
  • Yao Lu
  • Jingyuan Deng
  • Hui Lu
  • Long Jason Lu
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1279)

Abstract

Genes with indispensable functions are identified as essential; however, the traditional gene-level perspective of essentiality has several limitations. We hypothesized that protein domains, the independent structural or functional units of a polypeptide chain, are responsible for gene essentiality. If the essentiality of domains is known, the essential genes could be identified. To find such essential domains, we have developed an EM algorithm-based Essential Domain Prediction (EDP) Model. With simulated datasets, the model provided convergent results given different initial values and offered accurate predictions even with noise. We then applied the EDP model to six microbes and predicted 3,450 domains to be essential in at least one species, ranging 8–24 % in each species.

Key words

Essential genes Domains Essentiality Synthetic biology EM algorithm 

Notes

Acknowledgement

This work was supported by the exchange program fund of doctoral student under the Fudan University Graduate School (to Yulan Lu).

References

  1. 1.
    Mushegian A (1999) The minimal genome concept. Curr Opin Genet Dev 9(6):709–714PubMedCrossRefGoogle Scholar
  2. 2.
    de Berardinis V, Vallenet D, Castelli V, Besnard M, Pinet A, Cruaud C, Samair S, Lechaplais C, Gyapay G, Richez C, Durot M, Kreimeyer A, Le Fevre F, Schachter V, Pezo V, Doring V, Scarpelli C, Medigue C, Cohen GN, Marliere P, Salanoubat M, Weissenbach J (2008) A complete collection of single-gene deletion mutants of Acinetobacter baylyi ADP1. Mol Syst Biol 4:174. doi: 10.1038/msb.2008.10 PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
    Kobayashi M, Tsuda Y, Yoshida T, Takeuchi D, Utsunomiya T, Takahashi H, Suzuki F (2006) Bacterial sepsis and chemokines. Curr Drug Targets 7(1):119–134PubMedCrossRefGoogle Scholar
  4. 4.
    Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H (2006) Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2:2006 0008PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Kato J, Hashimoto M (2007) Construction of consecutive deletions of the Escherichia coli chromosome. Mol Syst Biol 3:132PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, Bhattacharya A, Kapatral V, D’Souza M, Baev MV, Grechkin Y, Mseeh F, Fonstein MY, Overbeek R, Barabasi AL, Oltvai ZN, Osterman AL (2003) Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J Bacteriol 185(19):5673–5684PubMedCentralPubMedCrossRefGoogle Scholar
  7. 7.
    Jacobs MA, Alwood A, Thaipisuttikul I, Spencer D, Haugen E, Ernst S, Will O, Kaul R, Raymond C, Levy R, Chun-Rong L, Guenthner D, Bovee D, Olson MV, Manoil C (2003) Comprehensive transposon mutant library of Pseudomonas aeruginosa. Proc Natl Acad Sci U S A 100(24):14339–14344. doi: 10.1073/pnas.2036282100 PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Liberati NT, Urbach JM, Miyata S, Lee DG, Drenkard E, Wu G, Villanueva J, Wei T, Ausubel FM (2006) An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants. Proc Natl Acad Sci U S A 103(8):2833–2838PubMedCentralPubMedCrossRefGoogle Scholar
  9. 9.
    Gallagher LA, Ramage E, Jacobs MA, Kaul R, Brittnacher M, Manoil C (2007) A comprehensive transposon mutant library of Francisella novicida, a bioweapon surrogate. Proc Natl Acad Sci U S A 104(3):1009–1014. doi: 10.1073/pnas.0606713104 PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Glass JI, Assad-Garcia N, Alperovich N, Yooseph S, Lewis MR, Maruf M, Hutchison CA III, Smith HO, Venter JC (2006) Essential genes of a minimal bacterium. Proc Natl Acad Sci U S A 103(2):425–430PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Hutchison CA, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, Smith HO, Venter JC (1999) Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286(5447):2165–2169PubMedCrossRefGoogle Scholar
  12. 12.
    Akerley BJ, Rubin EJ, Novick VL, Amaya K, Judson N, Mekalanos JJ (2002) A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae. Proc Natl Acad Sci U S A 99(2):966–971PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Forsyth RA, Haselbeck RJ, Ohlsen KL, Yamamoto RT, Xu H, Trawick JD, Wall D, Wang L, Brown-Driver V, Froelich JM, C KG, King P, McCarthy M, Malone C, Misiner B, Robbins D, Tan Z, Zhu Zy ZY, Carr G, Mosca DA, Zamudio C, Foulkes JG, Zyskind JW (2002) A genome-wide strategy for the identification of essential genes in Staphylococcus aureus. Mol Microbiol 43(6):1387–1400PubMedCrossRefGoogle Scholar
  14. 14.
    Ji Y, Zhang B, Van Horn SF, Warren P, Woodnutt G, Burnham MKR, Rosenberg M (2001) Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA. Science 293(5538):2266–2269PubMedCrossRefGoogle Scholar
  15. 15.
    Dowell RD, Ryan O, Jansen A, Cheung D, Agarwala S, Danford T, Bernstein DA, Rolfe PA, Heisler LE, Chin B, Nislow C, Giaever G, Phillips PC, Fink GR, Gifford DK, Boone C (2010) Genotype to phenotype: a complex problem. Science 328(5977):469PubMedCrossRefGoogle Scholar
  16. 16.
    Bruccoleri RE, Dougherty TJ, Davison DB (1998) Concordance analysis of microbial genomes. Nucleic Acids Res 26(19):4482–4486PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Arigoni F, Talabot F, Peitsch M, Edgerton MD, Meldrum E, Allet E, Fish R, Jamotte T, Curchod ML, Loferer H (1998) A genome-based approach for the identification of essential bacterial genes. Nat Biotechnol 16(9):851–856PubMedCrossRefGoogle Scholar
  18. 18.
    Freiberg C, Wieland B, Spaltmann F, Ehlert K, Brotz H, Labischinski H (2001) Identification of novel essential Escherichia coli genes conserved among pathogenic bacteria. J Mol Microbiol Biotechnol 3(3):483–489PubMedGoogle Scholar
  19. 19.
    Song JH, Ko KS, Lee JY, Baek JY, Oh WS, Yoon HS, Jeong JY, Chun J (2005) Identification of essential genes in Streptococcus pneumoniae by allelic replacement mutagenesis. Mol Cells 19(3):365–374PubMedGoogle Scholar
  20. 20.
    Zalacain M, Biswas S, Ingraham KA, Ambrad J, Bryant A, Chalker AF, Iordanescu S, Fan J, Fan F, Lunsford RD, O’Dwyer K, Palmer LM, So C, Sylvester D, Volker C, Warren P, McDevitt D, Brown JR, Holmes DJ, Burnham MK (2003) A global approach to identify novel broad-spectrum antibacterial targets among proteins of unknown function. J Mol Microbiol Biotechnol 6(2):109–126PubMedCrossRefGoogle Scholar
  21. 21.
    Gerdes S, Edwards R, Kubal M, Fonstein M, Stevens R, Osterman A (2006) Essential genes on metabolic maps. Curr Opin Biotechnol 17(5):448–456PubMedCrossRefGoogle Scholar
  22. 22.
    Liao BY, Zhang J (2008) Null mutations in human and mouse orthologs frequently result in different phenotypes. Proc Natl Acad Sci U S A 105(19):6987–6992PubMedCentralPubMedCrossRefGoogle Scholar
  23. 23.
    Hashimoto M, Ichimura T, Mizoguchi H, Tanaka K, Fujimitsu K, Keyamura K, Ote T, Yamakawa T, Yamazaki Y, Mori H, Katayama T, Kato J (2005) Cell size and nucleoid organization of engineered Escherichia coli cells with a reduced genome. Mol Microbiol 55(1):137–149. doi: 10.1111/j.1365-2958.2004.04386.x PubMedCrossRefGoogle Scholar
  24. 24.
    Winsor GL, Lam DK, Fleming L, Lo R, Whiteside MD, Yu NY, Hancock RE, Brinkman FS (2011) Pseudomonas Genome Database: improved comparative analysis and population genomics capability for Pseudomonas genomes. Nucleic Acids Res 39(Database issue):D596–D600. doi: 10.1093/nar/gkq869 PubMedCentralPubMedCrossRefGoogle Scholar
  25. 25.
    Uchiyama I, Higuchi T, Kawai M (2010) MBGD update 2010: toward a comprehensive resource for exploring microbial genome diversity. Nucleic Acids Res 38(Database issue):D361–D365. doi: 10.1093/nar/gkp948 PubMedCentralPubMedCrossRefGoogle Scholar
  26. 26.
    Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, Arkin AP, Astromoff A, El-Bakkoury M, Bangham R, Benito R, Brachat S, Campanaro S, Curtiss M, Davis K, Deutschbauer A, Entian KD, Flaherty P, Foury F, Garfinkel DJ, Gerstein M, Gotte D, Guldener U, Hegemann JH, Hempel S, Herman Z, Jaramillo DF, Kelly DE, Kelly SL, Kotter P, LaBonte D, Lamb DC, Lan N, Liang H, Liao H, Liu L, Luo C, Lussier M, Mao R, Menard P, Ooi SL, Revuelta JL, Roberts CJ, Rose M, Ross-Macdonald P, Scherens B, Schimmack G, Shafer B, Shoemaker DD, Sookhai-Mahadeo S, Storms RK, Strathern JN, Valle G, Voet M, Volckaert G, Wang CY, Ward TR, Wilhelmy J, Winzeler EA, Yang Y, Yen G, Youngman E, Yu K, Bussey H, Boeke JD, Snyder M, Philippsen P, Davis RW, Johnston M (2002) Functional profiling of the Saccharomyces cerevisiae genome. Nature 418(6896):387–391. doi: 10.1038/nature00935 PubMedCrossRefGoogle Scholar
  27. 27.
    Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJ, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C, Yong SY (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40(Database issue):D306–D312. doi: 10.1093/nar/gkr948 PubMedCentralPubMedCrossRefGoogle Scholar
  28. 28.
    Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD (2012) The Pfam protein families database. Nucleic Acids Res 40(Database issue):D290–D301. doi: 10.1093/nar/gkr1065 PubMedCentralPubMedCrossRefGoogle Scholar
  29. 29.
    Hastie T, Tibshirani R, Friedman JJH (2001) The elements of statistical learning, vol 1. Springer, New YorkCrossRefGoogle Scholar
  30. 30.
    Karev GP, Wolf YI, Rzhetsky AY, Berezovskaya FS, Koonin EV (2002) Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol 2:18PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Yulan Lu
    • 1
  • Yao Lu
    • 2
  • Jingyuan Deng
    • 3
  • Hui Lu
    • 2
    • 4
  • Long Jason Lu
    • 3
    • 5
    • 6
    • 7
  1. 1.State Key Laboratory of Genetic Engineering, Institute of Biostatistics, School of Life ScienceFudan UniversityShanghaiPeople’s Republic of China
  2. 2.Shanghai Institute of Medical Genetics, Children’s Hospital of ShanghaiShanghai Jiao Tong UniversityShanghaiPeople’s Republic of China
  3. 3.Division of Biomedical InformaticsCincinnati Children’s Hospital Medical CenterCincinnatiUSA
  4. 4.Department of Bioengineering (MC 063)University of Illinois at ChicagoChicagoUSA
  5. 5.Department of Computer ScienceUniversity of CincinnatiCincinnatiUSA
  6. 6.Department of Environmental HealthUniversity of CincinnatiCincinnatiUSA
  7. 7.Division of Biostatics and EpidemiologyCincinnati Children’s Hospital Medical CenterCincinnatiUSA

Personalised recommendations