Abstract
High-dimensional data, or high-feature variables, are often used to describe the characteristics of microRNA sequence and microarray data. As a consequence, the curse of high dimension often becomes a problem. High-dimension variables lead to many difficulties in processing and can be hard to understand. On the other aspect, as the sample size rather limited, the more variables, the more statistical error would be produced in the data processing. For the purpose of decreasing the dimension of variables, a degenerated k-mer method was suggested. To enhance the statistical robustness, the gapped k-mer method was introduced. In the last part of this chapter, some traditional supervised and unsupervised mathematical methods that used to decrease the dimensionality of the data are also described.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5):843–854
Pritchard CC, Cheng HH, Tewari M (2012) MicroRNA profiling: approaches and considerations. Nat Rev Genet 13(5):358–369
Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G (2000) The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403(6772):901–906
Wightman B, Ha I, Ruvkun G (1993) Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75(5):855–862
Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA (2005) MicroRNA expression profiles classify human cancers. Nature 435(7043):834–838
Herrera B, Lockstone H, Taylor J, Ria M, Barrett A, Collins S, Kaisaki P, Argoud K, Fernandez C, Travers M (2010) Global microRNA expression profiles in insulin target tissues in a spontaneous rat model of type 2 diabetes. Diabetologia 53(6):1099–1109
Pandey AK, Agarwal P, Kaur K, Datta M (2009) MicroRNAs in diabetes: tiny players in big disease. Cell Physiol Biochem 23(4–6):221–232
Zampetaki A, Kiechl S, Drozdov I, Willeit P, Mayr U, Prokopi M, Mayr A, Weger S, Oberhollenzer F, Bonora E (2010) Plasma microRNA profiling reveals loss of endothelial miR-126 and other microRNAs in type 2 diabetes. Circ Res 107(6):810–817
Liu B, Fang L, Wang S, Wang X, Li H, Chou K-C (2015) Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. J Theor Biol 385:153–159
Li A, Zhang J, Zhou Z (2014) PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 15(1):311
Zhang Y, Wang X, Kang L (2011) A k-mer scheme to predict piRNAs and characterize locust piRNAs. Bioinformatics 27(6):771–776
Chen W, Lei T-Y, Jin D-C, Lin H, Chou K-C (2014) PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem 456:53–60
Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou K-C (2014) PseKNC-general: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31(1):119–120
Chou K-C (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19
Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255
Xue C, Li F, He T, Liu G-P, Li Y, Zhang X (2005) Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6(1):310
Liu B, Fang L, Liu F, Wang X, Chen J, Chou K-C (2015) Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 10:e0121501
Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER (2013) The next-generation sequencing revolution and its impact on genomics. Cell 155(1):27–38
Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M (2012) Comparison of next-generation sequencing systems. Biomed Res Int 2012:11
Ghandi M, Lee D, Mohammad-Noori M, Beer MA (2014) Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput Biol 10(7):e1003711
Ghandi M, Mohammad-Noori M, Beer MA (2014) Robust k-mer frequency estimation using gapped k-mers. J Math Biol 69(2):469–500
Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, Beer MA (2015) A method to predict the impact of regulatory variants from DNA sequence. Nat Genet 47(8):955–961
Boulesteix A-L (2004) PLS dimension reduction for classification with microarray data. Stat Appl Genet Mol Biol 3(1):1–30
Dai JJ, Lieu L, Rocke D (2006) Dimension reduction for classification with gene expression microarray data. Stat Appl Genet Mol Biol 5(1)
Hero AO Dimension reduction for classification[J]
Li L, Simonoff JS, Tsai C-L (2007) Tobit model estimation and sliced inverse regression. Stat Modelling 7(2):107–123
Liu Y, Rayens W (2007) PLS and dimension reduction for classification. Comput Stat 22(2):189–208
Lue H-H (2009) Sliced inverse regression for multivariate response regression. J Stat Plan Inference 139(8):2656–2664
Wang H, Xia Y (2008) Sliced regression for dimension reduction. J Am Stat Assoc 103(482):811–821
Wu Q, Mukherjee S, Liang F (2009) Localized sliced inverse regression. In: Advances in neural information processing systems. MIT Press, Cambridge MA, pp 1785–1792
Li L, Li H (2004) Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics 20(18):3406–3412
Hisaoka M, Matsuyama A, Nagao Y, Luan L, Kuroda T, Akiyama H, Kondo S, Hashimoto H (2011) Identification of altered MicroRNA expression patterns in synovial sarcoma. Genes Chromosomes Cancer 50(3):137–145
Li W, Ruan K (2009) MicroRNA detection by microarray. Anal Bioanal Chem 394(4):1117–1124
Konishi H, Ichikawa D, Komatsu S, Shiozaki A, Tsujiura M, Takeshita H, Morimura R, Nagata H, Arita T, Kawaguchi T (2012) Detection of gastric cancer-associated microRNAs on microRNA microarray comparing pre-and post-operative plasma. Br J Cancer 106(4):740–747
Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459
Jolliffe I (2002) Principal component analysis. Wiley Online Library, New Jersey
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1):37–52
Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2(6):418–427
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Hu, Y., Lan, W., Miller, D. (2017). Handling High-Dimension (High-Feature) MicroRNA Data. In: Huang, J., et al. Bioinformatics in MicroRNA Research. Methods in Molecular Biology, vol 1617. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7046-9_13
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7046-9_13
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7044-5
Online ISBN: 978-1-4939-7046-9
eBook Packages: Springer Protocols