Abstract
Over the past years, a number of metrics have been introduced to characterize the topology of complex networks. We use these methodologies to analyze networks obtained through Blast data mining. The algorithm we present consists of the following steps: 1- encode results of Blast searches as a distance matrix of e-values; 2- perform entropy-controlled clustering analysis to identify the communities; 3- statistical analysis of the resulting network, 4- gene ontology and data mining in sequence databases to infer the function of the identified clusters. We report on the analysis of two data sets; the first is formed by over 3300 plasmid encoded proteins and the second comprises over 4200 sequences related to nitrogen fixation proteins. In the first case we observed strong selective pressures for horizontal transfer and maintenance of genes encoding proteins for resistance to antibiotics, plasmid stability and conjugal transfer. Nitrogen fixation proteins can be divided on the basis of our results into three different groups: proteins with no paralogs in any of the genomes considered, proteins with paralogs belonging to different metabolic processes (O–paralogs) and proteins with paralogs in other and the same metabolic processes (I/O–paralogs).
Keywords
- Betweenness Centrality
- Pyruvate Carboxylase
- Soft Matter Phys
- Topological Metrics
- Oxalacetate Decarboxylase
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhalg, J., Zhalg, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)
Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: highly sensitive and fast homology search. J. Bioinform. Comput. Biol. 2(3), 417–439 (2004)
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.: Complex Networks: Structure and Dynamics. Physics Reports 424, 175–308 (2006)
Dorogovtsev, S.N., Mendes, J.F., Samukhin, A.N.: Structure of growing networks with preferential linking. Phys Rev. Lett. 85(21), 4633–4636 (2000)
Colizza, V., Barrat, A., Barthelemy, M., Vespignani, A.: The role of the airline transportation network in the prediction and predictability of global epidemics. Proc. Natl. Acad. Sci. U S A 103(7), 2015–2020 (2006)
Gross, T., D’Lima, C.J., Blasius, B.: Epidemic dynamics on an adaptive network. Phys Rev. Lett. 96(20), 208701 (2006)
Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393, 440–442 (1998)
Barabasi, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004)
Forster, J., Famili, I., Fu, P., Palsson, B.O., Nielsen, J.: Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res. 13, 244–253 (2003)
Monge, R.A., Roman, E., Nombela, C., Pla, J.: The MAP kinase signal transduction network in Candida albicans. Microbiology 152, 905–912 (2006)
Herrgard, M.J., Covert, M.W., Palsson, B.O.: Reconstruction of microbial transcriptional regulatory networks. Curr. Opin. Biotechnol. 15, 70–77 (2004)
Jones, C.E., Baumann, U., Brown, A.L.: Automated methods of predicting the function of biological sequences using GO and BLAST. BMC Bioinformatics 15, 272 (2005)
Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., Koonin, E.V.: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28 (2001)
Li, L., Stoeckert Jr., C.J., Roos, S.D.: OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. (13), 2178–2189 (2003)
Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)
van Dongen, S.: Graph clustering by flow simulation. (2000) PhD thesis http://igitur-archive.library.uu.nl/dissertations/1895620/inhoud.htm , http://micans.org/mcl/
Johnson, T.J., Siek, K.E., Johnson, S.J., Nolan, L.K.: DNA sequence and comparative genomics of pAPEC-O2-R, an avian pathogenic Escherichia coli transmissible R plasmid. Antimicrob Agents Chemother 49, 4681–4688 (2005)
Thomas, C.M., Nielsen, K.M.: Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat. Rev. Microbiol. 3, 711–721 (2005)
Kondrashov, F.A., Kondrashov, A.S.: Role of selection in fixation of gene duplications. J. Theor. Biol. 21, 141–151 (2006)
Guimera, R., Sales-Pardo, M., Amaral, L.A.: Modularity from fluctuations in random graphs and complex networks. Phys Rev. E Stat. Nonlin. Soft Matter Phys 70, 025101 (2004)
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys Rev. E Stat. Nonlin. Soft Matter Phys 69, 026113 (2004)
Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. U S A 104, 36–41 (2007)
Gfeller, D., Chappelier, J.C., De Los Rios, P.: Finding instabilities in the community structure of complex networks. Phys Rev. E Stat. Nonlin. Soft Matter Phys 75, 056135 (2005)
Tetko, I.V., Facius, A., Ruepp, A., Mewes, H.W.: Super paramagnetic clustering of protein sequences. BMC Bioinformatics 6, 82 (2005)
Zhang, Z., Luo, Z.W., Kishino, H., Kearsey, M.J.: Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law. Mol. Biol. Evol. 22, 501–505 (2005)
Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., Rothberg, J.M.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000)
Wagner, A.: The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol. Biol. Evol. 18, 1283–1292 (2001)
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS 98, 4569–4574 (2001)
Goh, K.I., Oh, E., Jeong, H., Kahng, B., Kim, D.: Classification of scale-free networks. Proc. Natl. Acad. Sci. U S A 99, 12583–12588 (2002)
Li, S., Armstrong, C.M., Bertin, N., Ge, H., Milstein, S., Boxem., M., Vidalain, P.-O., Han, J.-D.J., Chesneau, A., Hao, T., Goldberg, D.S., Li, N., Martinez, M., Rual, J.-F., Lamesch, P., Xu, L., Tewari, M., Wong, S.L., Zhang, L.V., Berritz, G.F., Jacotot, L., Vaglio, P., Reboul, J., Hirozane-Kishikawa, T., Li, Q., Gabel, H.W., Elewa, A., Baumgartner, B., Rose, D.J., Yu, H., Bosak, S., Sequerra, R., Fraser, A., Mange, S.E., Saxton, W.M., Strome, S., van den Heuvel, S., Piano, F., Vandenhaute, J., Sardet, C., Gerstein, M., Doucette-Stamm, L., Gunsalus, K.C., Harper, J.W., Cusick, M.E., Roth, F.P., Hill, D.E., Vidal, M.: A map of the interactome network of the metazoan C. elegans. Science 303, 540–543 (2004)
Hughes, A.L., Friedman, R.: Gene Duplication and the Properties of Biological Networks. J. Mol. Evol. 61, 758–764 (2005)
Koonin, E.V., Wolf, Y.I., Karev, G.P.: The structure of the protein universe and genome evolution. Nature 420, 218–223 (2002)
Larsen, M.H., Figurski, D.H.: Structure, expression, and regulation of the kilC operon of promiscuous IncP alpha plasmids. J. Bacteriol. 176, 5022–5032 (1994)
Arthur, D.C., Ghetu, A.F., Gubbins, M.J., Edwards, R.A., Frost, L.S., Glover, J.N.: FinO is an RNA chaperone that facilitates sense-antisense RNA interactions. EMBO J. 22, 6346–6355 (2003)
Aguirre-Ramirez, M., Ramirez-Santos, J., Van Melderen, L., Gomez-Eichelmann, M.C.: Expression of the F plasmid ccd toxin-antitoxin system in Escherichia coli cells under nutritional stress. Can J. Microbiol. 52, 24–30 (2006)
Escobar-Paramo, P., Giudicelli, C., Parsot, C., Denamur, E.: The evolutionary history of Shigella and enteroinvasive Escherichia coli revised. J. Mol. Evol. 57, 140–148 (2003)
Hartman, A.B., Essiet, I.I., Isenbarger, D.W., Lindler, L.E.: Epidemiology of tetracycline resistance determinants in Shigella spp. and enteroinvasive Escherichia coli: characterization and dissemination of tet(A)-1. J. Clin. Microbiol. 41, 1023–1032 (2003)
Call, D.R., Kang, M.S., Daniels, J., Besser, T.E.: Assessing genetic diversity in plasmids from Escherichia coli and Salmonella enterica using a mixed-plasmid microarray. J. Appl. Microbiol. 100, 15–28 (2006)
Sperotto, R.A., Gross, J., Vedoy, C., Passaglia, L.M., Schrank, I.S.: The electron transfer flavoprotein fixABCX gene products from Azospirillum brasilense show a NifA-dependent promoter regulation. Curr. Microbiol. 49, 267–273 (2004)
Qiao, F., Bowie, J.U.: The many faces of SAM. Sci STKE. 286, re7 (2005)
Burke, D.H., Hearst, J.E., Sidow, A.: Early evolution of photosynthesis: clues from nitrogenase and chlorophyll iron proteins. Proc. Natl. Acad. Sci. U S A 90(15), 7134–7138 (1993)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lió, P., Brilli, M., Fani, R. (2008). Topological Metrics in Blast Data Mining: Plasmid and Nitrogen-Fixing Proteins Case Studies. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-70600-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)