Abstract
The problem to find the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of \(n \choose 2\) distances is known as Partial Digest problem, which occurs for instance in DNA physical mapping and de novo sequencing of proteins. Although Partial Digest was – as a combinatorial problem – already proposed in the 1930’s, its computational complexity is still unknown.
In an effort to model real-life data, we introduce two optimization variations of Partial Digest that model two different error types that occur in real-life data. First, we study the computational complexity of a minimization version of Partial Digest in which only a subset of all pairwise distances is given and the rest are lacking due to experimental errors. We show that this variation is NP-hard to solve exactly. This result answers an open question posed by Pevzner (2000). We then study a maximization version of Partial Digest where a superset of all pairwise distances is given, with some additional distances due to inaccurate measurements. We show that this maximization version is NP-hard to approximate to within a factor of \(|D|^{\frac{1}{2} -\varepsilon}\) for any ε >0, where |D| is the number of input distances. This inapproximability result is tight up to low-order terms as we give a trivial approximation algorithm that achieves a matching approximation ratio.
A preliminary version of this paper has been published as Technical Report 381, ETH Zurich, Department of Computer Science, October 2002.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, F., Karp, R.M., Newberg, L.A., Weisser, D.K.: Physical mapping of chromosomes: A combinatorial problem in molecular biology. In: Symposium on Discrete Algorithms, pp. 371–381 (1993)
Arora, S., Lund, C.: Hardness of approximations. In: Hochbaum, D. (ed.) Approximation Algorithms for NP-Hard Problems, pp. 399–446. PWS Publishing Company (1996)
Bafna, V., Edwards, N.: On de novo interpretation of tandem mass spectra for peptide identification. In: 7th Annual International Conference on Computational Biology (RECOMB 2003), pp. 9–18 (2003)
Baginsky, S.: Personal communication (2003)
Błażewicz, J., Formanowicz, P., Kasprzak, M., Jaroszewski, M., Markiewicz, W.T.: Construction of DNA restriction maps based on a simplified experiment. Bioinformatics 17(5), 398–404 (2001)
Chen, T., Kao, M., Tepel, M., Rush, J., Church, G.M.: A dynamic programming approacht to de novo peptide sequencing via tandem mass spectrometry. In: 11th SIAM-ACM Symposium on Discrete Algorithms (SODA), pp. 389–398 (2000)
Cieliebak, M., Eidenbenz, S.: Measurement errors make the partial digest problem np-hard, manuscript, to be published (2003)
Dakić, T.: On the turnpike problem. PhD thesis, Simon Fraser University (2000)
Dix, T.I., Kieronska, D.H.: Errors between sites in restriction site mapping. Computer Applications in the Biosciences (CABIOS) 4(1), 117–123 (1988)
Fasulo, D.: Algorithms for DNA Restriction Mapping. PhD thesis, University of Washington (2000)
Fütterer, J.: Personal communication (2002)
Håstad, J.: Clique is hard to approximate within n11 − ε. In: Proc. of the Symposium on Foundations of Computer Science (1996)
Inglehart, J., Nelson, P.C.: On the limitations of automated restriction mapping. Computer Applications in the Biosciences (CABIOS) 10(3), 249–261 (1994)
James, P.: Proteome Research: Mass Spectrometry. Springer, Heidelberg (2001)
Lemke, P., Werman, M.: On the complexity of inverting the autocorrelation function of a finite integer sequence, and the problem of locating n points on a line, given the \((^n_2)\) unlabelled distances between them. Preprint 453, Institute for Mathematics and its Application IMA (1988)
Newberg, L., Naor, D.: A lower bound on the number of solutions to the probed partial digest problem. Advances in Applied Mathematics (ADVAM) 14, 172–183 (1993)
Pandurangan, G., Ramesh, H.: The restriction mapping problem revisited. Journal of Computer and System Sciences (JCSS) (2002) (to appear); Special issue on Computational Biology
Pevzner, P.: Computational Molecular Biology. MIT Press, Cambridge (2000)
Pevzner, P.A., Waterman, M.S.: Open combinatorial problems in computational molecular biology. In: Proc. of the Third Israel Symposium on Theory of Computing and Systems ISTCS, pp. 158–173. IEEE Computer Society Press, Los Alamitos (1995)
Rosenblatt, J., Seymour, P.: The structure of homometric sets. SIAM Journal of Algorithms and Discrete Mathematics 3(3), 343–350 (1982)
Searls, D.B.: Formal grammars for intermolecular structure. In: Proceedings of the International IEEE Symposium on Intelligence in Neural and Biological Systems (1995)
Setubal, J., Meidanis, J.: Introduction to Computational Molecular Biology. PWS, Boston (1997)
Skiena, S.S., Smith, W., Lemke, P.: Reconstructing sets from interpoint distances. In: Sixth ACM Symposium on Computational Geometry, pp. 332–339 (1990)
Skiena, S.S., Sundaram, G.: A partial digest approach to restriction site mapping. Bulletin of Mathematical Biology 56, 275–294 (1994)
Woeginger, G.J., Yu, Z.L.: On the equal-subset-sum problem. Information Processing Letters 42, 299–302 (1992)
Wright, L.W., Lichter, J.B., Reinitz, J., Shifman, M.A., Kidd, K.K., Miller, P.L.: Computer-assisted restriction mapping: an integrated approach to handling experimental uncertainty. Computer Applications in the Biosciences (CABIOS) 10(4), 435–442 (1994)
Zhang, Z.: An Exponential Example for a Partial Digest Mapping Algorithm. Journal of Computational Biology 1(3), 235–239 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cieliebak, M., Eidenbenz, S., Penna, P. (2003). Noisy Data Make the Partial Digest Problem NP-hard . In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-39763-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive