Skip to main content

Match Chaining Algorithms for cDNA Mapping

  • Conference paper
Algorithms in Bioinformatics (WABI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Included in the following conference series:

Abstract

We propose a new algorithm called the MCCM (Match Chaining-based cDNA Mapping) algorithm that allows mapping cDNAs to the genomes efficiently and accurately, utilizing local matches called MUMs (maximal unique matches) or MRMs (maximal rare matches) obtained with suffix trees. From the MUMs (or MRMs), our algorithm selects appropriate matches which are related to the cDNA mapping. We call the selection the match chaining problem. Several O(klogk)-time algorithms are known where k is the number of the input matches, but they do not permit overlaps of the matches. We propose a new O(klogk)-time algorithm for the problem with provision for overlaps. Previously, only an O(k 2)-time algorithm existed. Furthermore, we also incorporate a restriction on the distances between matches for accurate cDNA mapping. We examine the performance of our algorithm through computational experiments using sequences of the FANTOM mouse cDNA database and the mouse genome. According to the experiments, the MCCM algorithm is not only very fast, but also very accurate: We achieved >95% specificity and >97% sensitivity at the same time against the mapping results of the FANTOM annotators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)

    MATH  Google Scholar 

  2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  3. Adelson-Velskil, G.M., Landis, E.M.: Soviet Math (Dokl.) 3, 1259–1263 (1962)

    Google Scholar 

  4. Bender, M.A., Farach, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Bentley, J., Maurer, H.: Efficient worst-case data structures for range searching. Acta Informatica 13, 155–168 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  6. Delcher, A.L., Kasif, S., Fleischmann, D., Paterson, J., White, O., Salzberg, S.L.: Alignment of whole genomes. Nucleic Acids Res. 27(11), 2369–2376 (1999)

    Article  Google Scholar 

  7. Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, L.: Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30(11), 2478–2483 (2002)

    Article  Google Scholar 

  8. FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I & II Team, Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature, 420, 563-573 (2002)

    Google Scholar 

  9. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symp. Foundations of Computer Science, pp. 137–143 (1997)

    Google Scholar 

  10. Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., Miller, W.: A computer program for aligning a cDNA Sequence with a Genomic DNA Sequence. Genome Res. 8, 967–974 (1998)

    Google Scholar 

  11. Gelfand, M.S., Mironov, A.A., Pevzner, P.A.: Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066 (1996)

    Article  Google Scholar 

  12. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  13. Hoehl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(Suppl. 1), S312–S320 (2002)

    Google Scholar 

  14. Kent, W.J.: The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

    MathSciNet  Google Scholar 

  15. Labrador, M., Mongelard, F., Plata-Rengifo, P., Bacter, E.M., Corces, V.G., Gerasimova, T.I.: Protein encoding by both DNA strands. Nature 409, 1000 (2001)

    Article  Google Scholar 

  16. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  17. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  18. Mott, R.: EST GENOME: A program to align spliced DNA sequences to unspliced genomic DNA. Comput. Applic. Biosci. 13(4), 477–478 (1997)

    Google Scholar 

  19. Myers, E., Miller, W.: Chaining multiple-alignment fragments in subquadratic time. In: Proc. ACM-SIAM Symp. on Discrete Algorithms, pp. 38–47 (1995)

    Google Scholar 

  20. Ogasawara, J., Morishita, S.: Fast and sensitive algorithm for aligning ESTs to Human Genome. In: Proc. 1st IEEE Computer Society Bioinformatics Conference, Palo Alto, CA, pp. 43–53 (2002)

    Google Scholar 

  21. Sze, S.-H., Pevzner, P.A.: Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment. J. Comp. Biol. 4(3), 297–309 (1997)

    Article  Google Scholar 

  22. Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  23. Usuka, J., Zhu, W., Brendel, V.: Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16(3), 203–211 (2000)

    Article  Google Scholar 

  24. Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shibuya, T., Kurochkin, I. (2003). Match Chaining Algorithms for cDNA Mapping. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39763-2_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20076-5

  • Online ISBN: 978-3-540-39763-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics