Skip to main content

HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads

  • 1451 Accesses

Part of the Lecture Notes in Computer Science book series (LNBI,volume 9683)

Abstract

Sequencing of RNA provides the possibility to study an individual’s transcriptome landscape and determine allelic expression ratios. Single-molecule protocols generate multi-kilobase reads longer than most transcripts allowing sequencing of complete haplotype isoforms. This allows partitioning the reads into two parental haplotypes. While the read length of the single-molecule protocols is long, the relatively high error rate limits the ability to accurately detect the genetic variants and assemble them into the haplotype-specific isoforms. In this paper, we present HapIso (Haplotype-specific Isoform Reconstruction), a method able to tolerate the relatively high error-rate of the single-molecule platform and partition the isoform reads into the parental alleles. Phasing the reads according to the allele of origin allows our method to efficiently distinguish between the read errors and the true biological mutations. HapIso uses a k-means clustering algorithm aiming to group the reads into two meaningful clusters maximizing the similarity of the reads within cluster and minimizing the similarity of the reads from different clusters. Each cluster corresponds to a parental haplotype. We use family pedigree information to evaluate our approach. Experimental validation suggests that HapIso is able to tolerate the relatively high error-rate and accurately partition the reads into the parental alleles of the isoform transcripts. Furthermore, our method is the first method able to reconstruct the haplotype-specific isoforms from long single-molecule reads.

The open source Python implementation of HapIso is freely available for download at https://github.com/smangul1/HapIso/.

Keywords

  • Single Nucleotide Variation
  • Reference Allele
  • Heterozygous Locus
  • Parental Haplotype
  • Mendelian Inconsistency

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Serghei Mangul and Harry Yang contributed equally to this work.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-38782-6_7
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-38782-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

References

  1. Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–i159 (2008)

    CrossRef  Google Scholar 

  2. Brinza, D., Zelikovsky, A.: 2SNP: scalable phasing based on 2-SNP haplotypes. Bioinformatics 22(3), 371–373 (2006)

    CrossRef  MATH  Google Scholar 

  3. Chaisson, M.J.P., Huddleston, J., Dennis, M.Y., Sudmant, P.H., Malig, M., Hormozdiari, F., Antonacci, F., Surti, U., Sandstrom, R., Boitano, M., Landolin, J.M., Stamatoyannopoulos, J.A., Hunkapiller, M.W., Korlach, J., Eichler, E.E.: Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2014)

    CrossRef  Google Scholar 

  4. Cloonan, N., Forrest, A.R., Kolle, G., Gardiner, B.B., Faulkner, G.J., Brown, M.K., Taylor, D.F., Steptoe, A.L., Wani, S., Bethel, G., et al.: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5(7), 613–619 (2008)

    CrossRef  Google Scholar 

  5. Cowles, C.R., Hirschhorn, J.N., Altshuler, D., Lander, E.S.: Detection of regulatory variation in mouse genes. Nat. Genet. 32(3), 432–437 (2002)

    CrossRef  Google Scholar 

  6. Degner, J.F., Marioni, J.C., Pai, A.A., Pickrell, J.K., Nkadori, E., Gilad, Y., Pritchard, J.K.: Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25(24), 3207–3212 (2009)

    CrossRef  Google Scholar 

  7. DeVeale, B., Van Der Kooy, D., Babak, T.: Critical evaluation of imprinted gene expression by RNA-Seq: a new perspective. PLoS Genet. 8(3), e1002600 (2012)

    CrossRef  Google Scholar 

  8. Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., et al.: Real-time DNA sequencing from single polymerase molecules. Science 323(5910), 133–138 (2009)

    CrossRef  Google Scholar 

  9. Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)

    CrossRef  Google Scholar 

  10. Mayba, O., Gilbert, H.N., Liu, J., Haverty, P.M., Jhunjhunwala, S., Jiang, Z., Watanabe, C., Zhang, Z.: Mbased: allele-specific expression detection in cancer tissues and cell lines. Genome Biol. 15(8), 405 (2014)

    CrossRef  Google Scholar 

  11. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297–1303 (2010)

    CrossRef  Google Scholar 

  12. Patterson, M., Marschall, T., Pisanti, N., van Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: WhatsHap: haplotype assembly for future-generation sequencing reads. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 237–249. Springer, Heidelberg (2014)

    CrossRef  Google Scholar 

  13. Saliba, A.E., Westermann, A.J., Gorski, S.A., Vogel, J.: Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42, 8845–8860 (2014)

    CrossRef  Google Scholar 

  14. Sharon, D., Tilgner, H., Grubert, F., Snyder, M.: A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31(11), 1009–1014 (2013)

    CrossRef  Google Scholar 

  15. Steijger, T., Abril, J.F., Engström, P.G., Kokocinski, F., Hubbard, T.J., Guigó, R., Harrow, J., Bertone, P., Consortium, R., et al.: Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013)

    CrossRef  Google Scholar 

  16. Sultan, M., Schulz, M.H., Richard, H., Magen, A., Klingenhoff, A., Scherf, M., Seifert, M., Borodina, T., Soldatov, A., Parkhomchuk, D., et al.: A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321(5891), 956–960 (2008)

    CrossRef  Google Scholar 

  17. Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., Bodeau, J., Tuch, B.B., Siddiqui, A., et al.: mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6(5), 377–382 (2009)

    CrossRef  Google Scholar 

  18. Tilgner, H., Grubert, F., Sharon, D., Snyder, M.P.: Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Nat. Acad. Sci. 111(27), 9869–9874 (2014)

    CrossRef  Google Scholar 

  19. Wang, X., Miller, D.C., Harman, R., Antczak, D.F., Clark, A.G.: Paternally expressed genes predominate in the placenta. Proc. Nat. Acad. Sci. 110(26), 10705–10710 (2013)

    CrossRef  Google Scholar 

  20. Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)

    CrossRef  Google Scholar 

  21. Wittkopp, P.J., Haerum, B.K., Clark, A.G.: Evolutionary changes in cis and trans gene regulation. Nature 430(6995), 85–88 (2004)

    CrossRef  Google Scholar 

  22. Wu, T.D., Watanabe, C.K.: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21(9), 1859–1875 (2005)

    CrossRef  Google Scholar 

  23. Xu, X., Wang, H., Zhu, M., Sun, Y., Tao, Y., He, Q., Wang, J., Chen, L., Saffen, D.: Next-generation DNA sequencing-based assay for measuring allelic expression imbalance (AEI) of candidate neuropsychiatric disorder genes in human brain. BMC Genomics 12(1), 518 (2011)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Serghei Mangul , Harry (Taegyun) Yang or Eleazar Eskin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mangul, S., Yang, H.(., Hormozdiari, F., Tseng, E., Zelikovsky, A., Eskin, E. (2016). HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads. In: Bourgeois, A., Skums, P., Wan, X., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2016. Lecture Notes in Computer Science(), vol 9683. Springer, Cham. https://doi.org/10.1007/978-3-319-38782-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-38782-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-38781-9

  • Online ISBN: 978-3-319-38782-6

  • eBook Packages: Computer ScienceComputer Science (R0)