Skip to main content

Protocols for Probing Genome Architecture of Regulatory Networks in Hydrocarbon and Lipid Microorganisms

  • Protocol
  • First Online:
Hydrocarbon and Lipid Microbiology Protocols

Part of the book series: Springer Protocols Handbooks ((SPH))

Abstract

Genome architecture and the regulation of gene expression are expected to be interdependent. Understanding this interdependence is key to successful genome engineering. Evidence for nonrandom arrangement of genes along genomes, defined as the relative positioning of cofunctional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species has highlighted the conservation of gene order (synteny) along chromosome regions. Secondly, the study of long-range regularities along chromosomes of one given species has emphasised periodic positioning of microbial genes that are either co-regulated, evolutionarily correlated, or highly codon biased. Software tools to detect, visualise, systematically analyse and exploit gene position regularities along genomes can facilitate the studies of such nonrandom genome layouts and the inference of transcription factor binding sites and potentially guide rational genome design. Here, a computational protocol is demonstrated for the analysis and exploitation of regular patterns in a set of genomic features of interest (e.g. cofunctional or co-regulated genes, chromatin immunoprecipitation results, etc.). This case study is conducted for genes involved in hydrocarbon metabolism of a marine petroleum-degrading bacterium Alcanivorax borkumensis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huynen MA, van Nimwegen E (1998) The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol 15(5):583–589

    Article  CAS  PubMed  Google Scholar 

  2. Dorman CJ (2013) Genome architecture and global gene regulation in bacteria: making progress towards a unified model? Nat Rev Microbiol 11:349–355

    Article  CAS  PubMed  Google Scholar 

  3. Képès F, Vaillant C (2003) Transcription-based solenoidal model of chromosomes. ComPlexUs 1:171–180

    Article  Google Scholar 

  4. Képès F (2004) Periodic transcriptional organization of the E.coli genome. J Mol Biol 340:957–964

    Article  PubMed  Google Scholar 

  5. Képès F (2003) Periodic epi-organization of the yeast genome revealed by the distribution of promoter sites. J Mol Biol 329:859–865

    Article  PubMed  Google Scholar 

  6. Junier I, Hérisson J, Képès F (2012) Genomic organization of evolutionarily correlated genes in bacteria: limits and strategies. J Mol Biol 419:369–386

    Article  CAS  PubMed  Google Scholar 

  7. Wright MA, Kharchenko P, Church GM, Segré D (2007) Chromosomal periodicity of evolutionarily conserved gene pairs. Proc Natl Acad Sci U S A 104:10559–10564

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ma Q, Ying X (2013) Global genomic arrangement of bacterial genes is closely tied with the total transcriptional efficiency. Genomics Proteomics Bioinformatics 11:66–71

    Article  PubMed  PubMed Central  Google Scholar 

  9. Porcar M, Danchin A, de Lorenzo V (2014) Confidence, tolerance, and allowance in biological engineering: the nuts and bolts of living things. Bioessays 37:95–102

    Article  PubMed  Google Scholar 

  10. Junier I, Hérisson J, Képès F (2010) Periodic pattern detection in sparse boolean sequences. Algorithms Mol Biol 5:31

    Article  PubMed  PubMed Central  Google Scholar 

  11. Elati M, Fekih R, Nicolle R, Junier I, Herisson J, Képès F (2011) Boosting binding sites prediction using gene’s positions. In: Algorithms in bioinformatics (WABI’11), LNCS – 6833, pp 92–103

    Google Scholar 

  12. Elati M, Nicolle R, Junier I, Fernández D, Fekih R, Font J, Képès F (2013) PreCisIon: PREdiction of CIS-regulatory elements improved by gene’s positION. Nucleic Acids Res 41(3):1406–1415

    Article  CAS  PubMed  Google Scholar 

  13. Turatsinze JV, Thomas-Chollier M, Defrance M, van Helden J (2008) Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 3(10):1578–1588

    Article  CAS  PubMed  Google Scholar 

  14. Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muñiz-Rascado L, García-Sotelo JS, Weiss V, Solano-Lira H, Martínez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernández S, Alquicira-Hernández K, López-Fuentes A, Porrón-Sotelo L, Huerta AM, Bonavides-Martínez C, Balderas-Martínez YI, Pannier L, Olvera M, Labastida A, Jiménez-Jacinto V, Vega-Alvarado L, Del Moral-Chávez V, Hernández-Alvarez A, Morett E, Collado-Vides J (2013) RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res 41:D203–D213

    Article  CAS  PubMed  Google Scholar 

  15. Schneiker S, Martins dos Santos VAP, Bartels D, Bekel T, Brecht M, Buhrmester J, Chernikova TN, Denaro R, Ferrer M, Gertler C, Goesmann A, Golyshina OV, Kaminski F, Khachane AN, Lang S, Linke B, McHardy AC, Meyer F, Nechitaylo T, Pühler A, Regenhardt D, Rupp O, Sabirova JS, Selbitschka W, Yakimov MM, Timmis KN, Vorhölter F-J, Weidner S, Kaiser O, Golyshin PN (2006) Genome sequence of the ubiquitous hydrocarbon-degrading marine bacterium Alcanivorax borkumensis. Nat Biotechnol 24:997–1004

    Article  CAS  PubMed  Google Scholar 

  16. Schneider KL, Pollard KS, Baertsch R, Pohl A, Lowe TM (2006) The UCSC archaeal genome browser. Nucleic Acids Res 34:D407–D410

    Article  CAS  PubMed  Google Scholar 

  17. Salgado H, Moreno-Hagelsieb G, Smith TF, Collado-Vides J (2000) Operons in Escherichia coli: genomic analyses and predictions. Proc Natl Acad Sci U S A 97:6652–6657

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Ester M, Kriegel H, Sander J, Xu X (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad UM (eds) Proceedings of the second international conference on knowledge discovery and data mining (KDD-96), Portland. AAAI, pp 226–231

    Google Scholar 

Download references

Acknowledgments

The authors thank the MEGA team members at iSSB for excellent discussions. This work was supported by Genopole, by the OSEO/BPI-France ‘BioIntelligence’ consortium and by the EU FP7 KBBE project ‘ST-FLOW’.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to François Képès .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Input File Format for a GREAT:SCAN:patterns Analysis (This Example Contains the Genes from A. borkumensis Involved in Hydrocarbon Degradation)

dhmA :

2734484

alkB1 :

3063242

alkB2 :

130408

aldh :

3066633

alkK :

3110974

alkL :

2198588

alkN :

114433

rubB :

170672

rubA :

171870

GntR :

129582

p450 :

2707944

p450b :

2607794

p450c :

217210

fdx :

216875

alkJ2 :

218643

FAD :

220424

AraC :

215704

alkG :

3064575

alkS :

3060396

Appendix 2: Usage Message of GREAT:SCAN:patterns

usage: patterns.R [-h] -t [<title> [<title> …]]

[-l <genome_in_bp>]

[-a <avgGene_in_bp>]

[-r [<per_bounds> [<per_bounds> …]]]

[-p <pvalue_thres>]

[-s <pvalue_select>]

[-d [<set_coords> [<set_coords> …]]]

[-k [<set_ticks> [<set_ticks> …]]]

[-c <clust_exponent>]

[-z <cluster_size>]

[-m <pvalue_mapping>]

[-i [<a_uniq_ID>]] [-v <path>]

[-o <output_path>]

<file_name>

Systematically analyse, cluster and visualise results from a complete GREAT:SCAN analysis. Full global_spectrum (-DOM and -CIRC analysis) followed by a DBSCAN clustering to identify the in-phase genes and a solenoid_map (sliding window) analysis and visualisation of the spread of all the possible periods.

positional arguments:

<file_name> The input file consisting of two columns of data formatted like this: <entity_ID> <entity_position>

optional arguments:

-h, --help show this help message and exit

-t [<title> [<title> …]], --title [<title> [<title> …]]

A substring to specify a title for the experiment

(default: None)

-l <genome_in_bp>, --chrom_length <genome_in_bp>

The length in bp of the organism chromosome

(default: 4639675)

-a <avgGene_in_bp>, --avg_gene <avgGene_in_bp>

The average gene length of the organism genes

(default: 1000)

-r [<per_bounds> [<per_bounds> …]], --period_range [<per_bounds> [<per_bounds> …]]

The range (min. -- max.) within which periods will be considered for further analysis (default: 5000)

-p <pvalue_thres>, --pvalue_thres <pvalue_thres>

The unweighted p-value threshold for considering a period for further analysis (default: 0.05)

-s <pvalue_select>, --pvalue_select <pvalue_select>

The weighted p-value threshold for selecting which periods will be printed (default: 0.05)

-d [<set_coords> [<set_coords> …]], --plot_coords [<set_coords> [<set_coords> …]]

Specifies a set of genomic coordinates to be printed as significant genome marks in the mapping plot (the E.coli macrodomains are defaults:

[46396, 603158, 1206296, 2180612, 2876552, 3758076])

-k [<set_ticks> [<set_ticks> …]], --plot_ticks [<set_ticks> [<set_ticks> …]]

Specifies a set of axis ticks to be printed as indicators of genome marks in the mapping plot (must be equal size with the coordinates).

(default: ['ori', 'right', 'R/ter', 'ter/L', 'left', 'ori'])

-c <clust_exponent>, --clust_exp <clust_exponent>

The clustering exponent. Assigns the minimum distance d between two points to be members of the same cluster. Specifies the exponent of the ratio between the length of the period and chromosome length (p/L). (default: 0.5)

-z <cluster_size>, --clust_size <cluster_size>

The minimum number of members for a group to be considered as a cluster (DBSCAN parameter)

(default: 2)

-m <pvalue_mapping>, --pvalue_map <pvalue_mapping>

The weighted p-value threshold for selecting which sliding window periods will be plotted (default: 0.001)

-i [<a_uniq_ID>], --uniq_ID [<a_uniq_ID>]

The unique ID for the generation of the results folder. (default: patternAnalysis_ xxxx_xx_xx)

-v <path>, --pv <path>

The path to the 'pv' fit parameters file.

(default: <installation_of_cmdline_programs>)

-o <output_path>, --output_path <output_path>

The absolute path for a directory (existing one including the trailing slash '/') where the output will be kept, or omit for the current working directory. (just the path, the directory name itself is controlled by the -i option).

(default: <current_working_dir>)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this protocol

Cite this protocol

Bouyioukos, C., Elati, M., Képès, F. (2015). Protocols for Probing Genome Architecture of Regulatory Networks in Hydrocarbon and Lipid Microorganisms. In: McGenity, T., Timmis, K., Nogales, B. (eds) Hydrocarbon and Lipid Microbiology Protocols . Springer Protocols Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/8623_2015_92

Download citation

  • DOI: https://doi.org/10.1007/8623_2015_92

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-50430-7

  • Online ISBN: 978-3-662-50432-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics