Academic

# R Recipes - A Problem-Solution Approach by Larry A. Pace

__Bibliographic Information__:Title: |
R Recipes - A Problem-Solution Approach |

Editor: |
Larry A. Pace |

Edition: |
1st |

Publisher: |
Apress Publisher |

Length: |
402 pages |

Size: |
3.70 MB |

Language: |
English |

R is an open source implementation of the programming language S, created at Bell Laboratories by John Chambers, Rick Becker, and Alan Wilks. In addition to R, S is the basis of the commercially available S-PLUS system. Widely recognized as the chief architect of S, Chambers in 1998 won the prestigious Software System Award from the Association for Computing Machinery, which said Chambers’ design of the S system “forever altered how people analyze, visualize, and manipulate data.”

Think of R as an integrated system or environment that allows users multiple ways to access its many functions and features. You can use R as an interactive command-line interpreted language, much like a calculator. Type a command, press Enter, and R provides the answer in the R console. R is simultaneously a functional language and an object-oriented language. In addition to thousands of contributed packages, R has programming features, just as all computer programming languages do, allowing conditionals and looping, and giving the user the facility to create custom functions and specify various input and output options.

R is widely used as a statistical computing and software environment, but the R Core Team would rather consider R an environment “within which many classical and modern statistical techniques have been implemented.” In addition to its statistical prowess, R provides impressive and flexible graphics capabilities. Many users are attracted to R primarily because of its graphical features. R has basic and advanced plotting functions with many customization features.

Chambers and others at Bell Labs were developing S while I was in college and grad school, and of course I was completely oblivious to that fact, even though my major professor and I were consulting with another AT&T division at the time. I began my own statistical software journey writing

programs in Fortran. I might find that a given program did not have a particular analysis I needed, such as a routine for calculating an intraclass correlation, so I would write my own program. BMDP and SAS were available in batch versions for mainframe computers when I was in graduate school—one had to learn Job Control Language (JCL) in order to tell the computer which tapes to load. I typed punch cards and used a card reader to read in JCL and data.

On a much larger and very much more sophisticated scale, this is essentially why the computer scientists at Bell Labs created S (for statistics). Fortran was and still is a general-purpose language, but it did not have many statistical capabilities. The design of S began with an informal meeting in 1976 at Bell Labs to discuss the design of a high-level language with an “algorithm,” which meant a Fortran-callable subroutine. Like its predecessor S, R can easily and transparently access compiled code from various other languages, including Fortran and C++ among others. R can also be interfaced with a variety of other programs, such as Python and SPSS.

R works in batch mode, but its most popular use is as an interactive data analysis, calculation, and graphics system running in a windowing system. R works on Linux, PC, and Mac systems. Be forewarned that R is not a pointand-

click graphical user interface (GUI) program such as SPSS or Minitab. Unlike these programs, R provides terse output, but can be queried for more information should you need it. In this book, you will see screen captures of R running in the Windows operating system.

According to my friend and colleague, computer scientist and bioinformatics expert Dr. Nathan Goodman, statistical analysis essentially boils down to four empirical problems: problems involving description, problems involving differences, problems involving relationships, and problems involving classification. I agree wholeheartedly with Nat. All the problems and solutions presented in this book fall into one or more of those general categories. The problems are manifold, but the solutions are mostly limited to these four situations.