Regression Analysis of Large Research Data: Dimensional Reduction Techniques

  • Yagyanath RimalEmail author
Conference paper
Part of the Learning and Analytics in Intelligent Systems book series (LAIS, volume 12)


The analytical article analyzes the best form of regression analysis for large research data using dimensional reduction techniques through R programming when research data present multicollinearity problems. The main aim is to explore the regression analysis of the main components and the partial regression analysis of two secondary data analyzes, as good as the effects of regression analysis is sufficiently identified regarding the graphical interpretation in the conclusion some data. Irish data records of 150 records were used with 4 attributes to analyze the regression of the principal components were grouped four principle components pc1 explained 73% variability of data and pc1 could explain only 22%. After analyzing multinominal regression the confusion matrix displays miss classification whit 0.06% miss classification rate. Similarly, another hitter dataset with 264 data set records with 20 attributes for partial least squares regression. The correlation coefficient lies in between 1 and −1 and it was observed highest value 0.97 indicates a positive correlation with the attributes of the petal width. whose multiple regression analysis is worked away and constitute of 4 species Randomly validated segments suggest that 50% of the batter’s data has been drastically shortened. Therefore, this analytical article presents the best manner to perform two different regression analyzes to have multicollinearity using R programming.


Principal component analysis Partial least squares regression 


  1. Fiona, G.: On the regulation of regression analysis (2018).
  2. Astrid Schneider, D.M.: Linear regression analysis. Medication, University of Multiplier (2010)Google Scholar
  3. Smith, D.: Revolutionary Analysis. Blog Revolutionary Analytics (2018)Google Scholar
  4. Michis, S.: Computer World Data Analysis. Computing, December 2018Google Scholar
  5. Paradis, E.: R for beginners. Universidad del Multiplier (2005)Google Scholar
  6. Kopf, D.: What programming you should memorize (2017).
  7. Piatetsky, G.: Four machine language. Analytics-data mining (2014)Google Scholar
  8. Jeevan, M.: Data Science (2018)Google Scholar
  9. Nicolaou, A.: The nine best languages ​​for data processing (2018).
  10. Cirillo, A.: Data Mining for Beginners. Universidad de Multiplier (2017)Google Scholar
  11. Rimal, Y.: Cross-validation method for the excessive adaptation of research data through r programming. CC (2018)Google Scholar
  12. Ihaka, R.: Big data on the mining sector: current status and prognosis for the future (1996)Google Scholar
  13. Pedregosa, F.: Machine learning n python. Data Science (2011)Google Scholar
  14. Nasridinov, A.: The third international conference on software, visual analysis for big data using r. N Cloud and Green Computing (CGC) (2013)Google Scholar
  15. Venkatesan, P.: A comparative survey on the regression of principal components and partial least squares regression with an application of diabetes data. Indian Journal of Science and Technology (2011)Google Scholar
  16. Mevik, B.-H.: Introduction to the plus package. Biometric, Wageningen University Research (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Faculty of Science and TechnologyPokhara UniversityPokharaNepal

Personalised recommendations