Filling a Protein Scaffold with a Reference

  • Letu Qingge
  • Xiaowen Liu
  • Farong Zhong
  • Binhai ZhuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9683)


In mass spectrometry-based de novo protein sequencing, it is hard to complete the sequence of the whole protein. Motivated by this we study the (one-sided) problem of filling a protein scaffold \(\mathcal{S}\) with some missing amino acids, given a sequence of contigs none of which is allowed to be altered, with respect to a complete reference protein \(\mathcal{P}\) of length n, such that the BLOSUM62 score between \(\mathcal{P}\) and the filled sequence \(\mathcal{S}'\) is maximized. We show that this problem is polynomial-time solvable in \(O(n^{26})\) time. We also consider the case when the contigs are not of high quality and they are concatenated into an (incomplete) sequence \(\mathcal{I}\), where the missing amino acids can be inserted anywhere in \(\mathcal{I}\) to obtain \(\mathcal{I}'\), such that the BLOSUM62 score between \(\mathcal{P}\) and \(\mathcal{I}'\) is maximized. We show that this problem is polynomial-time solvable in \(O(n^{22})\) time. Due to the high running time, both of these algorithms are impractical, we hence present several algorithms based on greedy and local search, trying to solve the problems practically. The empirical results show that the algorithms can fill protein scaffolds almost perfectly, provided that a good pair of scaffold and reference are given.


Matched Pair Protein Scaffold Tandem Mass Spectrum Local Search Method Practical Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research is partially supported by NSF of China under grant 60928006 and by the Opening Fund of Top Key Discipline of Computer Software and Theory in Zhejiang Provincial Colleges at Zhejiang Normal University. We also thank anonymous reviewers for several useful comments.


  1. 1.
    Bandeira, N., Pham, V., Pevzner, P., Arnott, D., Lill, J.: Beyond Edman degradation: automated de novo protein sequencing of monoclonal antibodies. Nat. Biotechnol. 26(12), 1336–1338 (2008)CrossRefGoogle Scholar
  2. 2.
    Bandeira, N., Tang, H., Bafna, V., Pevzner, P.: Shotgun protein sequencing by tandem mass spectra assembly. Anal. Chem. 76, 7221–7233 (2004)CrossRefGoogle Scholar
  3. 3.
    Bulteau, L., Carrieri, A.P., Dondi, R.: Fixed-parameter algorithms for scaffold filling. Theo. Comput. Sci. 568, 72–83 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)CrossRefGoogle Scholar
  5. 5.
    Jiang, H., Zhong, F., Zhu, B.: Filling scaffolds with gene repetitions: maximizing the number of adjacencies. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 55–64. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint and related distances. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1220–1229 (2012)CrossRefGoogle Scholar
  7. 7.
    Jiang, H., Ma, J., Luan, J., Zhu, D.: Approximation and nonapproximability for the one-sided scaffold filling problem. In: Xu, D., Du, D., Du, D. (eds.) COCOON 2015. LNCS, vol. 9198, pp. 251–263. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  8. 8.
    Liu, N., Jiang, H., Zhu, D., Zhu, B.: An improved approximation algorithm for scaffold filling to maximize the common adjacencies. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 905–913 (2013)CrossRefzbMATHGoogle Scholar
  9. 9.
    Liu, N., Zhu, D., Jiang, H., Zhu, B.: A 1.5-approximation algorithm for two-sided scaffold filling. Algorithmica 74(1), 91–116 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Liu, X., Han, Y., Yuen, D., Ma, B.: Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. Bioinformatics 25, 2174–2180 (2009)CrossRefGoogle Scholar
  11. 11.
    Liu, X., Dekker, L., Wu, S., Vanduijn, M., Luider, T., Tolic, N., Kou, Q., Dvorkin, M., Alexandrova, S., Vyatkina, K., Pasa-Tolic, L., Pevzner, P.: De Novo protein sequencing by combining top-down and bottom-up tandem mass spectra. J. Proteome Res. 13, 3241–3248 (2014)CrossRefGoogle Scholar
  12. 12.
    Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)CrossRefGoogle Scholar
  13. 13.
    Ma, B., Zhang, K., Liang, C.: An effective algorithm for peptide de novo sequencing from MA/MS spectra. J. Comput. Syst. Sci. 70(3), 418–430 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Muñoz, A., Zheng, C., Zhu, Q., Albert, V., Rounsley, S., Sankoff, D.: Scaffold filling, contig fusion and gene order comparison. BMC Bioinf. 11, 304 (2010)CrossRefGoogle Scholar
  15. 15.
    Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  16. 16.
    Pietrokovski, S., Henikoff, J., Henikoff, S.: The Blocks database - a system for protein classification. Nucl. Acids Res. 24(1), 197–200 (1996)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Letu Qingge
    • 1
  • Xiaowen Liu
    • 2
  • Farong Zhong
    • 3
  • Binhai Zhu
    • 1
    Email author
  1. 1.Department of Computer ScienceMontana State UniversityBozemanUSA
  2. 2.School of Informatics and ComputingIndiana University-Purdue University IndianapolisIndianapolisUSA
  3. 3.School of Mathematics, Physics and InformaticsZhejiang Normal UniversityJinhuaChina

Personalised recommendations