Processing Data in R and Python

  • Guy Lebanon
  • Mohamed El-Geish


There is no shortcut to knowledge; and there are no worthwhile data without preprocessing. In the first three sections of this chapter, we discuss situations that necessitate data preprocessing and how to handle them. In the final section we discuss how to manipulate data in general; specifically, how to manipulate data in R using the reshape2 and plyr packages and in Python using the pandas module.


Plyr Package Pandas Module Dataframe Import Numpy Reshape Package 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. H. Wickham. Reshaping data with the reshape package. Journal of Statistical Software, 21 (12), 2007.Google Scholar
  2. H. Wickham. The split-apply-combine strategy for data analysis. Journal of Statistical Software, 40 (1), 2011.Google Scholar
  3. R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. Wiley, second edition, 2002.Google Scholar
  4. P. J. Huber. Robust Statistics. Wiley, 1981.Google Scholar
  5. R. Maronna, D. R. Martin, and V. J. Yohai. Robust Statistics: Theory and Methods. Wiley, 2006.Google Scholar
  6. M. Kutner, C. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. McGraw-Hill, fifth edition, 2004.Google Scholar
  7. P. Spector. Data Manipulation with R. Springer, 2008.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Guy Lebanon
    • 1
  • Mohamed El-Geish
    • 2
  1. 1.AmazonMenlo ParkUSA
  2. 2.VoiceraSanta ClaraUSA

Personalised recommendations