Skip to main content

Handling High-Dimension (High-Feature) MicroRNA Data

  • Protocol
  • First Online:
Bioinformatics in MicroRNA Research

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1617))

Abstract

High-dimensional data, or high-feature variables, are often used to describe the characteristics of microRNA sequence and microarray data. As a consequence, the curse of high dimension often becomes a problem. High-dimension variables lead to many difficulties in processing and can be hard to understand. On the other aspect, as the sample size rather limited, the more variables, the more statistical error would be produced in the data processing. For the purpose of decreasing the dimension of variables, a degenerated k-mer method was suggested. To enhance the statistical robustness, the gapped k-mer method was introduced. In the last part of this chapter, some traditional supervised and unsupervised mathematical methods that used to decrease the dimensionality of the data are also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5):843–854

    Article  CAS  PubMed  Google Scholar 

  2. Pritchard CC, Cheng HH, Tewari M (2012) MicroRNA profiling: approaches and considerations. Nat Rev Genet 13(5):358–369

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G (2000) The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403(6772):901–906

    Article  CAS  PubMed  Google Scholar 

  4. Wightman B, Ha I, Ruvkun G (1993) Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75(5):855–862

    Article  CAS  PubMed  Google Scholar 

  5. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA (2005) MicroRNA expression profiles classify human cancers. Nature 435(7043):834–838

    Article  CAS  PubMed  Google Scholar 

  6. Herrera B, Lockstone H, Taylor J, Ria M, Barrett A, Collins S, Kaisaki P, Argoud K, Fernandez C, Travers M (2010) Global microRNA expression profiles in insulin target tissues in a spontaneous rat model of type 2 diabetes. Diabetologia 53(6):1099–1109

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Pandey AK, Agarwal P, Kaur K, Datta M (2009) MicroRNAs in diabetes: tiny players in big disease. Cell Physiol Biochem 23(4–6):221–232

    Article  CAS  PubMed  Google Scholar 

  8. Zampetaki A, Kiechl S, Drozdov I, Willeit P, Mayr U, Prokopi M, Mayr A, Weger S, Oberhollenzer F, Bonora E (2010) Plasma microRNA profiling reveals loss of endothelial miR-126 and other microRNAs in type 2 diabetes. Circ Res 107(6):810–817

    Article  CAS  PubMed  Google Scholar 

  9. Liu B, Fang L, Wang S, Wang X, Li H, Chou K-C (2015) Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. J Theor Biol 385:153–159

    Article  CAS  PubMed  Google Scholar 

  10. Li A, Zhang J, Zhou Z (2014) PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 15(1):311

    Article  PubMed  PubMed Central  Google Scholar 

  11. Zhang Y, Wang X, Kang L (2011) A k-mer scheme to predict piRNAs and characterize locust piRNAs. Bioinformatics 27(6):771–776

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Chen W, Lei T-Y, Jin D-C, Lin H, Chou K-C (2014) PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem 456:53–60

    Article  CAS  PubMed  Google Scholar 

  13. Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou K-C (2014) PseKNC-general: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31(1):119–120

    Article  PubMed  Google Scholar 

  14. Chou K-C (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19

    Article  CAS  PubMed  Google Scholar 

  15. Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255

    Article  CAS  PubMed  Google Scholar 

  16. Xue C, Li F, He T, Liu G-P, Li Y, Zhang X (2005) Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6(1):310

    Article  PubMed  PubMed Central  Google Scholar 

  17. Liu B, Fang L, Liu F, Wang X, Chen J, Chou K-C (2015) Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 10:e0121501

    Article  PubMed  PubMed Central  Google Scholar 

  18. Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER (2013) The next-generation sequencing revolution and its impact on genomics. Cell 155(1):27–38

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M (2012) Comparison of next-generation sequencing systems. Biomed Res Int 2012:11

    Google Scholar 

  20. Ghandi M, Lee D, Mohammad-Noori M, Beer MA (2014) Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput Biol 10(7):e1003711

    Article  PubMed  PubMed Central  Google Scholar 

  21. Ghandi M, Mohammad-Noori M, Beer MA (2014) Robust k-mer frequency estimation using gapped k-mers. J Math Biol 69(2):469–500

    Article  PubMed  Google Scholar 

  22. Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, Beer MA (2015) A method to predict the impact of regulatory variants from DNA sequence. Nat Genet 47(8):955–961

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Boulesteix A-L (2004) PLS dimension reduction for classification with microarray data. Stat Appl Genet Mol Biol 3(1):1–30

    Google Scholar 

  24. Dai JJ, Lieu L, Rocke D (2006) Dimension reduction for classification with gene expression microarray data. Stat Appl Genet Mol Biol 5(1)

    Google Scholar 

  25. Hero AO Dimension reduction for classification[J]

    Google Scholar 

  26. Li L, Simonoff JS, Tsai C-L (2007) Tobit model estimation and sliced inverse regression. Stat Modelling 7(2):107–123

    Article  Google Scholar 

  27. Liu Y, Rayens W (2007) PLS and dimension reduction for classification. Comput Stat 22(2):189–208

    Article  CAS  Google Scholar 

  28. Lue H-H (2009) Sliced inverse regression for multivariate response regression. J Stat Plan Inference 139(8):2656–2664

    Article  Google Scholar 

  29. Wang H, Xia Y (2008) Sliced regression for dimension reduction. J Am Stat Assoc 103(482):811–821

    Article  CAS  Google Scholar 

  30. Wu Q, Mukherjee S, Liang F (2009) Localized sliced inverse regression. In: Advances in neural information processing systems. MIT Press, Cambridge MA, pp 1785–1792

    Google Scholar 

  31. Li L, Li H (2004) Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics 20(18):3406–3412

    Article  CAS  PubMed  Google Scholar 

  32. Hisaoka M, Matsuyama A, Nagao Y, Luan L, Kuroda T, Akiyama H, Kondo S, Hashimoto H (2011) Identification of altered MicroRNA expression patterns in synovial sarcoma. Genes Chromosomes Cancer 50(3):137–145

    Article  CAS  PubMed  Google Scholar 

  33. Li W, Ruan K (2009) MicroRNA detection by microarray. Anal Bioanal Chem 394(4):1117–1124

    Article  CAS  PubMed  Google Scholar 

  34. Konishi H, Ichikawa D, Komatsu S, Shiozaki A, Tsujiura M, Takeshita H, Morimura R, Nagata H, Arita T, Kawaguchi T (2012) Detection of gastric cancer-associated microRNAs on microRNA microarray comparing pre-and post-operative plasma. Br J Cancer 106(4):740–747

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459

    Article  Google Scholar 

  36. Jolliffe I (2002) Principal component analysis. Wiley Online Library, New Jersey

    Google Scholar 

  37. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1):37–52

    Article  CAS  Google Scholar 

  38. Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2(6):418–427

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenjun Lan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Hu, Y., Lan, W., Miller, D. (2017). Handling High-Dimension (High-Feature) MicroRNA Data. In: Huang, J., et al. Bioinformatics in MicroRNA Research. Methods in Molecular Biology, vol 1617. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7046-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7046-9_13

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7044-5

  • Online ISBN: 978-1-4939-7046-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics