Sports Analytics in Practice with R. Ted KwartlerЧитать онлайн книгу.
work at all. It’s just some crap some people who were really smart made up.
Charles Barkley, former NBA player
Just because you don’t understand something doesn’t mean it’s crap.
Ross Drucker, NBA Future Analytics Stats Program Analyst
My dear Nora & Brenna,
My inspiration and guides. I wrote this book in your honor though don’t expect either of you to follow my footsteps into analysis. Your journey is your own, may you find a passion and, if desirable, have the opportunity to write about it. No matter where your attention and intellect lead you I remain.
Your loving father,
Ted
Foreword
Writing a book is no easy task yet for some reason I decided to write a second! Overall, I am grateful to the countless people that helped me learn, expand, and apply these methods. Data science and analytics is as much as “team sport” as any, where collaboration, communication, and effort often wins the day.
First I would like to acknowledge Jack W, whose intellect and athleticism left us far too early. For anyone struggling with mental health, know that you are loved, you are valuable, and people in your community are here for you. Your passing was a motivating reminder of the short time we have to make contributions along with the need for more kindness toward those that may be suffering silently.
Next, Anup B, one of the most brilliant supportive leaders I have worked for. Not to mention your passion for cricket helped open my eyes to a noteworthy and enjoyable sport. Losing you to the pandemic was a disturbing blow felt by many people who were touched by your intelligence, humor, and positivity.
This entire book would not have been possible without the fine professors at the University of Notre Dame that put me on my own professional journey. I fondly remember building my first logistic regression predicting March Madness after learning these techniques from Dr. Keating, the late Dr. Gilbride, and Dr. Devaraj.
Further I would like to acknowledge my parents, Anatol and Trish, and my endearing wife, Meghan. Your support and patience has been significant. Writing a book is no small undertaking with much of the logistical burden falling to each of you. Completing this book is a shared victory.
Lastly, my sincerest gratitude to the wonderful team at Wiley, particularly Kimberly Monroe-Hill. Your patience and flexibility to late submissions and delayed seasons stemming from the unusual 2020 year in sports (among other more important hardships) has been greatly appreciated. I was ready to give up on the project yet your e-mails demonstrated a commitment from Wiley that I cherish.
1 Introduction to R
Objectives
Learn about R as a programming language
Define Integrated Development Environment
Define objects
Learn the assignment operator
Define functions
Executing a loop
Learn logical operators
Learn about R data types
Learn about object classes
Indexing data objects
Extending R functionality with packages
Writing a custom function
Create a scatter plot with sports data
Create a heatmap with sports data
R Libraries
ggplot2 ggthemes RCurl tidyr
R Functions
+ plot <- round class as.factor as.character c cbind rbind data.frame as.matrix as.data.frame install.packages library getURL read.csv dim names head tail summary table qplot pivot_longer geom_tile scale_fill_gradient xlab ggtitle theme theme_hc
The R Programming Language
R is an open-source, freely available programming language used throughout this book. R is a powerful and longstanding programming language developed more than 20 years ago. It is a derivative of the “S” programming language for statistics originating in the mid-1990s developed by AT&T and Lucent Technologies. Unlike other programming languages, R is optimized specifically for statistics including but not limited to simulation, machine learning, visualizations, and traditional statistical modeling (linear regression) as well as tests. Due to the open-source nature of R, many developers, academics, and enthusiasts have contributed to its development for their specific needs. As a result, the language is extensible meaning it can be easily used for various purposes. For example, through R markdown, simple websites and presentations can be created. In another use case, R can be used for traditional linear modeling or machine learning and can draw upon various data types for analysis including audio files, digital images, text, numeric, and various other data files and types. Thus, it is widely used and nonspecialized other than to say R is an analysis language. This differs from other languages which specialize in web development like Ruby or python which has extended its functionality to building applications not just analysis.
In this textbook, the R language is applied specifically to sports contexts. Of course, the code in this book can be used to extend your understanding of sports analytics. It may give you insights to a particular sport or analytical aspect within the sport itself such as what statistics should be focused on to win a basketball game. However, learning the code in this book can also help open up a world of analytical capabilities beyond sports. One of the benefits of learning statistics, programming, and various analysis methods with sports data is that the data is widely available and outcomes are known. This means that your analysis, models, and visualizations can be applied, and you can review the outcomes as you expand upon what is covered in this book. This differs from other programming and statistical examples which may resort to boring, synthetic data to illustrate an analytical result. Using sports data is realistic and can be future oriented, making the learning more challenging yet engaging. Modeling the survivors of the Titanic pales in comparison since you cannot change the historical outcome or save future cruise ship mates. Thus, modeling which team will win a match or which player is a good draft pick is a superior learning experience.
If you are new to programming don’t be intimidated. R is a forgiving language in that things like spacing an indentation are ignored. Further, the R community is well supported and a simple online search of any error message usually finds an answer quickly on any number of sites.
To begin your R and sports analytics journey, please download the “base-R” distribution for your operating system. The “Comprehensive R Archive Network,” CRAN, is the home of the official R distribution as well as officially supported packages (more on that in a bit). The site to download base-R is https://cran.r-project.org.
Unfortunately, base-R, having started in the nineties, looks abysmal and lacks some modern day functionality. Thus, you will need to next download the R-Studio Integrated Development Environment, or IDE. An IDE is software that consolidates many of the aspects needed to code into one place. For example, you will need to write code which could be done in a simple notepad like program, a place to execute the code written, a place to visualize plots that were output from the code, and so on. These individual components are assembled into the IDE for ease of use and fast development. R and many other languages have IDEs. In fact, R has multiple IDE optimized for the type of analysis you are performing such as biostatistics or working with another language like Java. The most popular and easily supported IDE for