Skip to content

Data Analysis With R

Abe edited this page Apr 15, 2022 · 36 revisions

What is R?

R is an open source statistical programming language that will help you with many of your data analyzing responsibilities. R is a fantastic resource for creating probability distributions, running statistical tests and analyses, training and fitting models, as well as prepping and cleaning data or creating publication quality data visualizations. It is often seen as more of a programming language for academics and scientists who need p values to get published, but the reality is that R is a necessary tool for many data science applications and one you should be familiar with.

Getting Started

You can download R here. Additionally, you can download RStudio, a free (unless you need a commercial license) interactive development environment (IDE) for R that will manage your work space, let you look up package information and syntax, display plots, and more.

If you are a total beginner to R, this is a basic introduction to R that discussions the different data types, some useful functions, plots, and more. If you have any programming experience, getting started should be straightforward, and even if you are totally new to programming, R is a simple enough first programming language to learn.

Resources

CRAN

CRAN, or the Comprehensive R Archive Network, is a repository of R software, packages, and documentation. Note, this is where you downloaded R from initially. One of the greatest benefits of using R is the vast number of unique packages that you have access to through CRAN, and can easily download and use on your personal machine. With over 18,000 packages (at the time this tutorial was written), there are many packages at your fingertips, and after a bit of searching, you might find a prewritten package made to handle the specific analysis you are interested in conducting, saving you time and providing you with a basis to improve from.

Tidyverse

The Tidyverse is a collection of R packages (that are found on CRAN) that share a consistent philosophy regarding how to store, manipulate, visualize, (insert data related verb) data in R. As a result, these packages work together easily and efficiently, which will benefit you and your task greatly. Keeping your data "Tidy" will ensure it is compatible with the Tidyverse, as well as make it easier for others to read, which is in your best interests when collaborating with others. The best resource for getting familiar with the Tidyverse is R for Data Science, the free online textbook by none other than the creator of the Tidyverse and most of its core packages, Hadley Wickham. It is also a good beginner R resource, in general.


Issues used in the creation of this page

(Some of these issues may be closed or open/in progress.)

Clone this wiki locally