Reverse de Bruijn: Utilizing Reverse Peptide Synthesis to Cover All Amino Acid k-mers

  • Yaron Orenstein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10812)


Peptide arrays measure the binding intensity of a specific protein to thousands of amino acid peptides. By using peptides that cover all k-mers, a comprehensive picture of the binding spectrum is obtained. Researchers would like to measure binding to the longest k-mer possible, but are constrained by the number of peptides that can fit into a single microarray. A key challenge is designing a minimum number of peptides that cover all k-mers. Here, we suggest a novel idea to reduce the length of the sequence covering all k-mers by utilizing a unique property of the peptide synthesis process. Since the synthesis can start from both ends of the peptide template, it is enough to cover each k-mer or its reverse, and use the same template twice: in forward and reverse. Then, the computational problem is to generate a minimum length sequence that for each k-mer either contains it or its reverse. We developed an algorithm ReverseCAKE to generate such a sequence. ReverseCAKE runs in time linear in the output size and is guaranteed to produce a sequence that is longer by at most \(\varTheta (\sqrt{n}\log {n})\) characters compared to the optimum n. The obtained saving factor by ReverseCAKE approaches the theoretical lower bound as k increases. In addition, we formulated the problem as an integer linear program and empirically observed that the solutions obtained by ReverseCAKE are near-optimal. Through this work we enable more effective design of peptide microarrays.


de Bruijn graph de Bruijn sequence Peptide array Reverse synthesis Array design 


  1. 1.
    Gurard-Levin, Z.A., Kilian, K.A., Kim, J., Bähr, K., Mrksich, M.: Peptide arrays identify isoform-selective substrates for profiling endogenous lysine deacetylase activity. ACS Chem. Biol. 5(9), 863–873 (2010)CrossRefGoogle Scholar
  2. 2.
    Buus, S., Rockberg, J., Forsström, B., Nilsson, P., Uhlen, M., Schafer-Nielsen, C.: High-resolution mapping of linear antibody epitopes using ultrahigh-density peptide microarrays. Mol. Cell. Proteomics 11(12), 1790–1800 (2012)CrossRefGoogle Scholar
  3. 3.
    Halperin, R.F., Stafford, P., Johnston, S.A.: Exploring antibody recognition of sequence space through random-sequence peptide microarrays. Mol. Cell. Proteomics 10(3), M110.000786 (2011)Google Scholar
  4. 4.
    Berger, M.F., Philippakis, A.A., Qureshi, A.M., He, F.S., Estep III, P.W., Bulyk, M.L.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24(11), 1429 (2006)CrossRefGoogle Scholar
  5. 5.
    Fordyce, P.M., Gerber, D., Tran, D., Zheng, J., Li, H., DeRisi, J.L., Quake, S.R.: De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nat. Biotechnol. 28(9), 970–975 (2010)CrossRefGoogle Scholar
  6. 6.
    Benoiton, N.L.: Chemistry of Peptide Synthesis. CRC Press (2016)Google Scholar
  7. 7.
    Philippakis, A.A., Qureshi, A.M., Berger, M.F., Bulyk, M.L.: Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comput. Biol. 15(7), 655–665 (2008)MathSciNetCrossRefGoogle Scholar
  8. 8.
    D’Addario, M., Kriege, N., Rahmann, S.: Designing q-Unique DNA sequences with integer linear programs and Euler tours in de Bruijn graphs. In: OASIcs-OpenAccess Series in Informatics, vol. 26. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2012)Google Scholar
  9. 9.
    Smith, R.P., Riesenfeld, S.J., Holloway, A.K., Li, Q., Murphy, K.K., Feliciano, N.M., Orecchia, L., Oksenberg, N., Pollard, K.S., Ahituv, N.: A compact, in vivo screen of all \(6\)-mers reveals drivers of tissue-specific expression and guides synthetic regulatory element design. Genome Biol. 14(7), 1 (2013)CrossRefGoogle Scholar
  10. 10.
    Orenstein, Y., Shamir, R.: Design of shortest double-stranded DNA sequences covering all \(k\)-mers with applications to protein-binding microarrays and synthetic enhancers. Bioinformatics 29(13), i71–i79 (2013)CrossRefGoogle Scholar
  11. 11.
    Orenstein, Y., Berger, B.: Efficient design of compact unstructured RNA libraries covering all k-mers. J. Comput. Biol. 23(2), 67 (2016)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Ray, D., Kazan, H., Cook, K.B., Weirauch, M.T., Najafabadi, H.S., Li, X., Gueroussov, S., Albu, M., Zheng, H., Yang, A., et al.: A compendium of RNA-binding motifs for decoding gene regulation. Nature 499(7457), 172 (2013)CrossRefGoogle Scholar
  13. 13.
    West, D.B., et al.: Introduction to Graph Theory, vol. 2. Prentice Hall, Upper Saddle River (2001)Google Scholar
  14. 14.
    Gurobi Optimization, I.: Gurobi optimizer reference manual (2016).

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringBen-Gurion University of the NegevBeer-ShevaIsrael

Personalised recommendations