Skip to main content

Permutation Editing and Matching via Embeddings

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2076))

Abstract

If the genetic maps of two species are modelled as permutations of (homologous) genes, the number of chromosomal rearrangements in the form of deletions, block moves, inversions etc. to transform one such permutation to another can be used as a measure of their evolutionary distance. Motivated by such scenarios, we study problems of computing distances between permutations as well as matching permutations in sequences, and finding most similar permutation from a collection (“nearest neighbor”).

We adopt a general approach: embed permutation distances of relevance into well-known vector spaces in an approximately distance-preserving manner, and solve the resulting problems on the well-known spaces. Our results are as follows:

  1. We present the first known approximately distance preserving embeddings of these permutation distances into well-known spaces.

  2. Using these embeddings, we obtain several results, including the first known efficient solution for approximately solving nearest neighbor problems with permutations and the first known algorithms for finding permutation distances in the “data stream” model.

  3. We consider a novel class of problems called permutation matching problems which are similar to string matching problems, except that the pattern is a permutation (rather than a string) and present linear or near-linear time algorithms for approximately solving permutation matching problems; in contrast, the corresponding string problems take significantly longer.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. V. Bafna and P. A. Pevzner. Genome rearrangements and sorting by reversals. In Proceedings of the 34th Annual Symposium on Foundations of Comptuer Science, pages 148–157, Palo Alto, CA, 1993. IEEE Computer Society Press.

    Google Scholar 

  2. Vineet Bafna and Pavel A. Pevzner. Sorting by transpositions. SIAM Journal on Discrete Mathematics, 11(2):224–240, May 1998.

    Article  MATH  MathSciNet  Google Scholar 

  3. A. Caprara. Sorting by reversals is difficult. In Proceedings of the First International Conference on Computational Molecular Biology, pages 75–83, 1997.

    Google Scholar 

  4. David A. Christie. A 3/2-approximation algorithm for sorting by reversals. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 244–252, San Francisco, California, 25-27 January 1998.

    Google Scholar 

  5. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. In IEEE Symposium on Foundations of Computer Science (FOCS), pages 501–511, 1999.

    Google Scholar 

  6. Vincent Ferretti, Joseph H. Nadeau, and David Sankoff. Original synteny. In Combinatorial Pattern Matching, 7th Annual Symposium, volume 1075 of Lecture Notes in Computer Science, pages 159–167. Springer, 1996.

    Google Scholar 

  7. Leslie Ann Goldberg, Paul W. Goldberg, Mike Paterson, Pavel Pevzner, Süleyman Cenk Sahinalp, and Elizabeth Sweedyk. The complexity of gene placement. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 386–395, N.Y., January 17-19 1999. ACM-SIAM.

    Google Scholar 

  8. Qian-Ping Gu, Shietung Peng, and Hal Sudborough. A 2-approximation algorithm for genome rearrangements by reversals and transpositions. Theoretical Computer Science, 210(2):327–339, 17 January 1999.

    Article  MATH  MathSciNet  Google Scholar 

  9. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC-98), pages 604–613, 1998.

    Google Scholar 

  10. Howard Karloff. Fast algorithms for approximately counting mismatches. Information Processing Letters, 48(2):53–60, November 1993.

    Article  MATH  MathSciNet  Google Scholar 

  11. J. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica, 13(1/2):180–210, January 1995.

    Article  MATH  MathSciNet  Google Scholar 

  12. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Effiient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC-98), pages 614–623, 1998.

    Google Scholar 

  13. J. H. Nadeau and B. A. Taylor. Lengths of chromosome segments conserved since divergence of man and mouse. Proc. Nat’l Acad. Sci. USA, 81:814–818, 1984.

    Article  Google Scholar 

  14. D. Sankoff and J. Nadeau. Conserved synteny as a measure of genomic distance. DAMATH: Discrete Applied Mathematics and Combinatorial Operations Research and Computer Science, 71, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cormode, G., Muthukrishnan, S., Sahinalp, S.C. (2001). Permutation Editing and Matching via Embeddings. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds) Automata, Languages and Programming. ICALP 2001. Lecture Notes in Computer Science, vol 2076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48224-5_40

Download citation

  • DOI: https://doi.org/10.1007/3-540-48224-5_40

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42287-7

  • Online ISBN: 978-3-540-48224-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics