A modest proposal: an approach to making the internal R system extensible
- 57 Downloads
The R computing environment has become an important part of the statistical community and fostered the development of over a thousand add-on packages, many representing state-of-the-art research in statistical methodology. Although it is relatively easy to develop functionality on top of the system, it is very difficult for developers to directly extend the core system itself—the language, the interpreter and the internal data structures. Yet the ability to easily introduce new core, first-class data structures into the system that are customized and efficient is becoming essential in this era of large, complex data sets and innovative algorithms and data structures. While the community that might use such a facility to introduce new data types may be small, it is potentially very talented and important, and may lead to significant innovations that allow us to continue to leverage R for the next 5 years or more in rich new ways. I describe some of the difficulties that people encounter in extending the system and suggest that an object-oriented architecture for the internal implementation of R (or any system) would make such low-level internals extensible by package developers and not just the core development team. This would promote potentially rich experimentation that would allow us and others to approach new styles of computation in R, while simultaneously maintaining the existing important community which provides so much value-added to the R environment. Specifically, transforming the R implementation from a representation-specific architecture to a C++ abstract/virtual interface-based architecture may be the least disruptive approach to the continued evolution of R, and would bring many advantages and some technical challenges. Such an approach involves many technical details and potential degradations in performance. Due to the length of the this paper, I do not explore these issues in great detail but introduce the basic concepts. I do, however, refer to some technical aspects that are best understood with some knowledge of the implementation of R at the level of using the .Call () interface in R.
KeywordsData Type Vector Type External Pointer Native Code Numeric Vector
Unable to display preview. Download preview PDF.
- R Development Core Team (2008a) R: A language and environment for statistical computing. ISBN 3-900051-07-0. http://www.R-project.org
- R Development Core Team (2008b) Writing R extensions. ISBN 3-900051-11-9Google Scholar
- Brun R, Rademakers F (1997) ROOT—An object oriented data analysis framework. In: Proceedings AIHENP’96 Workshop, Lausanne, September 1996, Nucl Inst Methods Phys Res A 389:81–86. See also http://root.cern.ch/.
- Edlefsen L (2006) ExaStat. http://www.exametrix.com/products/#q12
- Stroustrup B (2000) The C++ Programming Language. Addison Wesley, Reading, MA, USAGoogle Scholar
- Temple Lang D (2007) The RGCCTranslationUnit package. http://www.omegahat.org/RGCCTranslationUnit. January 2007
- Temple Lang D, Chambers J (2000) The SJava package for R. http://www.omegahat.org/RSJava. March 2000
- Temple Lang D, Gentleman R, Morgan M (2005) The Type Info package for R. http://bioconductor.org/packages/2.2/bioc/html/TypeInfo.html. September 2005
- Tierney L (2004) Simple references with finalization. http://www.cs.uiowa.edu/~luke/R/simpleref.html
- Urbanek S (2007) Low-level R to Java interface. http://www.rforge.net/rJava