Sports Analytics in Practice with R. Ted KwartlerЧитать онлайн книгу.
code chunks, the new object in the environment, and the recreated plot in the utility pane.
Figure 1.6 The renewed plot with an R object in the environment.
# Create a plot with an object value plot(x = xVal, y = 2)
The basic functionality of R is underpinned by functions and objects. Each package that specializes R comes with a set of functions usually coordinated for a particular task like data manipulation, obtaining sports data or similar. Functions accept inputs, including objects, and manipulate the inputs most often to create new objects or to overwrite and replace existing objects. For example, the following code creates a new object `newObj
` using the assignment operator and on the right-hand side employs a base-R function. Base-R functions do not require any libraries to be loaded, so there is no need to specialize the R environment for a particular task. The `newObj
` variable is declared as a result of a function `round
` with two input parameters. The first parameter accepts the number to be rounded, `1.23
`. The second parameter `digits = 0
` is a tuning parameter which changes the behavior of the `round
` function declaring the number of decimals to round the input to. Thus, when you add the following code to the script and then execute it in the console, the resulting `newObj
` variable has a corresponding value of 1. As before, the `newObj
` object will be stored actively and shown in the “Environment” tab. Keep in mind the inputs themselves can be objects not just declared values. As a result of this behavior, scripts manipulate objects and often pass them to another function later in the script.
# Create a new object with a function newObj <- round(1.23, digits = 0)
This book will illustrate many functions both in base-R and within specialized packages applied in a sports context. R has many tens of thousands of packages with corresponding functions. Often the rest of this book will defer to base-R functions in an effort for standardization, stability, and ease of understanding rather than utilize an esoteric package. This is a deliberate choice to improve conceptual understanding but does leave room for code optimization and improvement.
There are additional intermediate programming operators that are employed in this book. In fact, there are multiple types of logical and arithmetic operators but for the most part the scripts in this book are focused on one use case at a time, with linear thinking, so you can focus on the concepts and applications more so than concise code. However, Table 1.1 describes the three control flow operators used in the book with a code example for you to try in your script and console. Within the FOR loop, a set of code is run repeatedly with a variable that changes each time through. For the latter two, the IF and IFELSE control flows, a logical statement is evaluated and controls the code’s behavior. If the statement is run and returns TRUE, then the code is executed otherwise it is ignored.
Table 1.1 Three simple control flows in R including the FOR loop, IF and IFELSE statement.
Name | Code | Description |
---|---|---|
FOR loops |
for (i in 1:4){
print(i + 2)
}
|
The FOR loop has a dynamic variable `i ` which will update a number of times. Here, the `i ` value loop will repeat from 1, 2, 3, and 4. The code within the curly brackets executes with the updated `i ` value. The first time through the loop `i ` equals `1 ` and with `+ 2 ` the value 3 is printed to the console. The second time through `i ` updates to `2 ` and is once again added with `+ 2 ` so that the value `4 ` is printed. This continues in the loop 4 times because of the `1:4 ` parameter
|
IF statement |
if(xVal == 1){
print('xVal is equal to one.')
}
|
The IF statement is a control operator. After the `if ` code, a statement is created to check its validity. If the statement inside parentheses evaluates to TRUE, then the code within the curly brackets is executed. In this example, the statement checks whether a variable `xVal ` is equal to `1 `. Since it does, the code in the curly brackets executes and a message is printed to the console state “xVal is equal to one.” If the statement does not evaluate to TRUE, the code inside the curly brackets is ignored. For example, if `xVal == 2 `, then the code block is not run
|
IF ELSE statement |
if(xVal == 1){
print('xVal is equal to one.')
} else {
print('xVal is not equal to one.')
}
|
The IF-ELSE control flow adds another layer to the previous IF statement. Now a new set of curly brackets is added along with the `else ` function. This statement will execute one of the two code chunks within the curly brackets based on the TRUE or FALSE result of the logical statement. Here, if `xVal == 1 `, then the first message is printed, same as before. However, for any other value of `xVal `, the second bit of code is run. For example, if `xVal == 2 `, then the IF statement evaluates to FALSE and the second message “xVal is not equal to one” will be printed to the console.
|
Another aspect of R programming is that it can utilize various data object types referred to as classes. Previously, the `xVal
` object was a single numeric value, however can analyze and work the other common data types. First R can understand the difference between an integer, a whole number, and a numeric value. The distinction is that a numeric data type can be a number with a decimal. Although this difference can seem subtle in some computational work, this has an impact. If you’ve been following the simple code examples in this chapter, you should have `xVal
`, `newObj
` and an `i
` variable from the previous FOR loop. Reviewing the “Environment” pane you will note the `i
` variable has a `4L
` instead of just 4. This denotes that the variable is a whole number without a decimal. In contrast, the `xVal
` object has a `1
` without the “L.” This means R is understanding this value to be a decimal or floating-point number. You can check the class difference using the `class
` function applied to any object. Notice how the third `class` function call switches the returned value to “numeric” when a decimal is added. Often this distinction is not impactful but there are times as you will learn in this book that functions expect specific object types.
class(i) class(xVal) class(i +.01)
In addition to integers and numeric values, common R data types include “Boolean” values known in R as “logical” object types. Boolean data types are merely TRUE or FALSE. R can interpret these values as occurring or not occurring as shown in the IF statements. Additionally, for some operations, Boolean values can be interpreted as 1 and 0 for TRUE and FALSE, respectively. For example, in R `TRUE + TRUE
` will return a value of `2
` while `TRUE – FALSE
` will return `1
`, because R interprets the Boolean as 1 – 0. Let’s create a Boolean object called `TFobj
` in the code below for use later.
TFobj <- TRUE
Another data type R often utilizes is a “factor.” A factor is a non-unique description of information. For example, a sports team may be assigned to a conference. Another team may also be assigned to that conference as well