Sports Analytics in Practice with R. Ted KwartlerЧитать онлайн книгу.
The code executed in this book should work for either cloud or local but installation of base-R and R-Studio on a server is not covered. Therefore, please download the R-Studio desktop IDE by navigating to https://www.rstudio.com/products/rstudio.
The R-Studio IDE, or Integrated Development Environment, adds functionality and modern user interface to base-R. The IDE aggregates common functionality used for software development and statistical analysis.
Essentially R-Studio sits on top of base-R. The IDE provides a modern GUI expected of today’s computer users while also adding functionality including the use of version control, terminal access and perhaps most importantly an easy way to create and view visualizations for easy export and saving to disk. Figure 1.1 illustrates the basic relationship for base-R and R-Studio. As you can see without base-R, the IDE will not function because none of the computational functions exist in the IDE itself.
Figure 1.1 The relationship between base-R and R-studio.
Now that you have both base-R and R-Studio, let’s start to explore the programming environment. Think of an R environment as a relatively generic statistical piece of software. Once downloaded it can perform all tasks programmatically found in many of the popular spread sheet programs either online or for a laptop. The advantage of R is its extensibility mentioned earlier. R can be specialized from a generic statistical set of tools into a more interesting and nuanced piece of software. This is done through the download of specialized packages and called in the console by loading the package for the task at hand.
Figure 1.2 shows the IDE itself without a “script” to be executed. For now, focus on the “console” section in Figure 1.2. This is the lower left-hand side containing a “>” symbol. This is the section where code will be executed and results are returned.
Figure 1.2 The R-Studio IDE console.
The next step is to navigate to “File > New File > R Script” in the upper left of the IDE. This will open another pane in the IDE. The script pane will be located in the upper left section of the IDE and will shrink the console on the lower left-hand side. While the console is where code is executed and computation enacted, the scripting section is where you will write code that is then run within the console. Think of an R script as merely a lightweight text file that can be saved and repeated by running in the console. A script is nothing more than a set of instructions that have not been enacted yet. To save an R script, navigate to “File > Save” and then simply follow the IDE dialog. The rest of the book provides R scripts for you to execute along with explanations along the way. Figure 1.3 shows the new script pane with some basic example code.
Figure 1.3 The upper left R script with basic commands and comments.
Of particular note in the script shown in Figure 1.3 are two comments and two code examples. A comment begins with a `#
`. This tells R to ignore everything on that line. As you begin your learning journey programming in R, it is a best practice to add comments to remind yourself the nuances of the code to be executed. Thus, feel free to make a copy of any scripts throughout the book, add comments, and save for yourself.
The first code to be executed, beginning on a non-commented line, is a simple arithmetic operation shown below.
2 + 2
Since this is in a script, it will not be run until you declare it within the console. Further, as you can guess the operation `2 + 2
` has a single result `4
`. An easy way to run the script is to place your cursor on the line you want to execute and click the “run” icon on the upper right-hand side of the script. When this is done the code is transferred to the console and executed, returning the single answer as expected. Figure 1.4 illustrates the transfer between script and console.
Figure 1.4 Showing the code execution on line 2 of the script being transferred to the console where the result 4 is printed.
Next, let’s execute another command which will illustrate another pane of the IDE. If your cursor is on line 5 of the R script, `plot(x = 1, y = 2)
` and you click the “run” icon you will now see a simple scatter plot visual appear in the lower right utility pane titled “Plots.” Each tab of the utility pane is described below:
Files—This is a file navigation view, where you can review folders and files to be used in analysis or saved to disk.
Plots—For reviewing any static visualizations the R code creates. This pane can also be used for resizing the image using a graphical user interface (GUI) and saving the plots to disk.
Packages—Since R needs to be specialized for a particular task, this pane lists your local package library with official documentation and accompanying examples, vignettes, and tutorials.
Help—Provides various resources for obtaining help with R and its many tasks.
Viewer—This pane allows you to view the small webpages and dynamic interactive plots which R can create.
Figure 1.5 shows the result of a basic, yet not visually appealing scatter plot with a single point. Rest assured the plots throughout the book are more compelling than this simplistic example. The x,y coordinate points are defined in code as `x = 1
` and `y = 2
`.
Figure 1.5 The basic scatter plot is instantiated in the lower right, “Plots” plane.
Next, let’s focus on the remaining upper right pane of the IDE. The primary tab of interest is the “Environment” tab. R works by creating objects which are stored data objects. When an object is created, it is held in active memory, your computer’s RAM. Any active objects in your R session will be shown in the “Environment” tab in the upper right. Add the following code in the script (upper left) pane, then click “run” to instantiate an object in your environment. Notice the first bit follows a `#
` so the non-code comment “Create an object” acts as a signpost for you while the next line actually creates the object. Specifically, the object name is `xVal
` and it is declared have a value of `1
`. Moreover, the declaration of the object name to value is done with the assignment operate `<-
`. In the R language you can also use an equal sign for the object name assignment. However, most R style guides use the `<-
` operator and this book follows that direction.
# Create an object xVal <- 1
When run the upper right environment tab will now show an object, `xVal
`, that is held in memory for use later in the script. Of course, these objects can become much more complex than a single value. Next add more code to your script utilizing the `xVal
` object rather than declaring the value explicitly. The following code can be added to your script and then run to recreate the simple scatter plot from before. The difference is that R has substituted the `x = xVal
` input to `x = 1
` since that is the object’s actual value. The only difference in the plots is that the second one has a different x-axis title because the value was derived from the