Advertisement

Finding Teams in Graphs and Its Application to Spatial Gene Cluster Discovery

  • Tizian Schulz
  • Jens Stoye
  • Daniel DoerrEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10562)

Abstract

Gene clusters are sets of genes in a genome with associated functionality. Often, they exhibit close proximity to each other on the chromosome which can be beneficial for their common regulation. A popular strategy for finding gene clusters is to exploit the close proximity by identifying sets of genes that are consistently close to each other on their respective chromosomal sequences across several related species.

Yet, even more than gene proximity on linear DNA sequences, the spatial conformation of chromosomes may provide a pivotal indicator for common regulation and/or associated function of sets of genes.

We present the first gene cluster model capable of handling spatial data. Our model extends a popular computational model for gene cluster prediction, called \(\delta \) -teams, from sequences to general graphs. In doing so, \(\delta \)-teams are single-linkage clusters of a set of shared vertices between two or more undirected weighted graphs such that the largest link in the cluster does not exceed a given threshold \(\delta \) in any input graph.

We apply our model to human and mouse data to find spatial gene clusters, i.e., gene sets with functional associations that exhibit close neighborhood in the spatial conformation of the chromosome across species.

Keywords

Spatial gene cluster Gene teams Single-linkage clustering Graph teams Hi-C data 

Notes

Acknowledgements

We are very grateful to Krister Swenson for kindly providing the Hi-C data used in this study and for his many valuable suggestions. We wish to thank Pedro Feijão for many fruitful discussions in the beginning of this project. This work was partially supported by DFG GRK 1906/1.

References

  1. 1.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  2. 2.
    Beal, M., Bergeron, A., Corteel, S., Raffinot, M.: An algorithmic view of gene teams. Theoret. Comput. Sci. 320(2–3), 395–418 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Belton, J.M., McCord, R.P., Gibcus, J.H., Naumova, N., Zhan, Y., Dekker, J.: Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58(3), 268–276 (2012)CrossRefGoogle Scholar
  4. 4.
    Burton, J.N., Adey, A., Patwardhan, R.P., Qiu, R., Kitzman, J.O., Shendure, J.: Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31(12), 1119–1125 (2013)CrossRefGoogle Scholar
  5. 5.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge (1990)zbMATHGoogle Scholar
  6. 6.
    Díaz-Díaz, N., Aguilar-Ruiz, J.S.: Go-based functional dissimilarity of gene sets. BMC Bioinform. 12(1), 360 (2011)CrossRefGoogle Scholar
  7. 7.
    Didier, G., Schmidt, T., Stoye, J., Tsur, D.: Character sets of strings. J. Discret. Algorithms 5(2), 330–340 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., Ren, B.: Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485(7398), 376–380 (2012)CrossRefGoogle Scholar
  9. 9.
    He, X., Goldwasser, M.H.: Identifying conserved gene clusters in the presence of homology families. J. Comput. Biol. 12(6), 638–656 (2005)CrossRefGoogle Scholar
  10. 10.
    Jacob, F., Perrin, D., Sanchez, C., Monod, J.: Operon: a group of genes with the expression coordinated by an operator. C. R. Hebd. Seances Acad. Sci. 250, 1727–1729 (1960)Google Scholar
  11. 11.
    Jahn, K.: Efficient computation of approximate gene clusters based on reference occurrences. J. Comput. Biol. 18(9), 1255–1274 (2011)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Larroux, C., Fahey, B., Degnan, S.M., Adamski, M., Rokhsar, D.S., Degnan, B.M.: The NK homeobox gene cluster predates the origin of Hox genes. Curr. Biol. 17(8), 706–710 (2007)CrossRefGoogle Scholar
  13. 13.
    Ryba, T., Hiratani, I., Lu, J., Itoh, M., Kulik, M., Zhang, J., Schulz, T.C., Robins, A.J., Dalton, S., Gilbert, D.M.: Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res. 20(6), 761–770 (2010)CrossRefGoogle Scholar
  14. 14.
    Schmidt, T., Stoye, J.: Gecko and GhostFam: rigorous and efficient gene cluster detection in prokaryotic genomes. Methods Mol. Biol. 396, 165–182 (2007). (Chapter 12)CrossRefGoogle Scholar
  15. 15.
    Selvaraj, S., Dixon, J.R., Bansal, V., Ren, B.: Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol. 31(12), 1111–1118 (2013)CrossRefGoogle Scholar
  16. 16.
    Sexton, T., Yaffe, E., Kenigsberg, E., Bantignies, F., Leblanc, B., Hoichman, M., Parrinello, H., Tanay, A., Cavalli, G.: Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148(3), 458–472 (2012)CrossRefGoogle Scholar
  17. 17.
    Thévenin, A., Ein-Dor, L., Ozery-Flato, M., Shamir, R.: Functional gene groups are concentrated within chromosomes, among chromosomes and in the nuclear space of the human genome. Nucleic Acids Res. 42(15), 9854–9861 (2014)CrossRefGoogle Scholar
  18. 18.
    Uno, T., Yagiura, M.: Fast algorithms to enumerate all common intervals of two permutations. Algorithmica 26(2), 290–309 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Wang, B.F., Kuo, C.C., Liu, S.J., Lin, C.H.: A new efficient algorithm for the gene-team problem on general sequences. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(2), 330–344 (2012)CrossRefGoogle Scholar
  20. 20.
    Wang, B.F., Lin, C.H.: Improved algorithms for finding gene teams and constructing gene team trees. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(5), 1258–1272 (2010)CrossRefGoogle Scholar
  21. 21.
    Wang, B.F., Lin, C.H., Yang, I.T.: Constructing a gene team tree in almost O(n lg n) time. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(1), 142–153 (2014)CrossRefGoogle Scholar
  22. 22.
    Winter, S., Jahn, K., Wehner, S., Kuchenbecker, L., Marz, M., Stoye, J., Böcker, S.: Finding approximate gene clusters with Gecko 3. Nucleic Acids Res. 44(20), 9600–9610 (2016)Google Scholar
  23. 23.
    Yates, A., Akanni, W., Amode, M.R., Barrell, D., Billis, K., Carvalho-Silva, D., Cummins, C., Clapham, P., Fitzgerald, S., Gil, L., Girn, C.G., Gordon, L., Hourlier, T., Hunt, S.E., Janacek, S.H., Johnson, N., Juettemann, T., Keenan, S., Lavidas, I., Martin, F.J., Maurel, T., McLaren, W., Murphy, D.N., Nag, R., Nuhn, M., Parker, A., Patricio, M., Pignatelli, M., Rahtz, M., Riat, H.S., Sheppard, D., Taylor, K., Thormann, A., Vullo, A., Wilder, S.P., Zadissa, A., Birney, E., Harrow, J., Muffato, M., Perry, E., Ruffier, M., Spudich, G., Trevanion, S.J., Cunningham, F., Aken, B.L., Zerbino, D.R., Flicek, P.: Ensembl 2016. Nucleic Acids Res. 44(D1), D710 (2016)CrossRefGoogle Scholar
  24. 24.
    Zhang, M., Leong, H.W.: Gene team tree - a hierarchical representation of gene teams for all gap lengths. J. Comput. Biol. 16(10), 1383–1398 (2009)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Technology and CeBiTecBielefeld UniversityBielefeldGermany
  2. 2.International Research Training Group 1906 “Computational Methods for the Analysis of the Diversity and Dynamics of Genomes”Bielefeld UniversityBielefeldGermany

Personalised recommendations