Download the PDF of the presentation

This tutorial has been forked from awesome classes developed by Adam Wilson here: http://adamwilson.us/RDataScience/

The R Script associated with this page is available here. Download this file and open it (or copy-paste into a new script) with RStudio so you can follow along.

```
x=1
x
```

`## [1] 1`

We can also assign a vector to a variable:

```
x=c(5,8,14,91,3,36,14,30)
x
```

`## [1] 5 8 14 91 3 36 14 30`

And do simple arithmetic:

`x+2`

`## [1] 7 10 16 93 5 38 16 32`

Create a new variable called `y`

and set it to `15`

`y=15`

Note that `R`

is case sensitive, if you ask for `X`

instead of `x`

, you will get an error

```
X
Error: object 'X' not found
```

Naming your variables is your business, but there are 5 conventions to be aware of:

**alllowercase**:*e.g.*`adjustcolor`

**period.separated**:*e.g.*`plot.new`

**underscore_separated**:*e.g.*`numeric_version`

**lowerCamelCase**:*e.g.*`addTaskCallback`

**UpperCamelCase**:*e.g.*`SignatureMethod`

`x`

`## [1] 5 8 14 91 3 36 14 30`

Subset the vector using `x[ ]`

notation

`x[5]`

`## [1] 3`

You can use a `:`

to quickly generate a sequence:

`1:5`

`## [1] 1 2 3 4 5`

and use that to subset as well:

`x[1:5]`

`## [1] 5 8 14 91 3`

To calculate the mean, you could do it *manually* like this

`(5+8+14+91+3+36+14+30)/8`

`## [1] 25.125`

Or use a function:

`mean(x)`

`## [1] 25.125`

Type `?functionname`

to get the documentation (`?mean`

) or `??"search parameters`

(??“standard deviation”) to search the documentation. In RStudio, you can also search in the help panel. `mean`

has other arguments too:

`mean(x, trim = 0, na.rm = FALSE, ...)`

In RStudio, if you press `TAB`

after a function name (such as `mean(`

), it will show function arguments.

Calculate the standard deviation of `c(3,6,12,89)`

.

```
y=c(3,6,12,89)
sqrt((sum((y-mean(y))^2))/(length(y)-1))
```

`## [1] 41.17038`

```
#or
sd(y)
```

`## [1] 41.17038`

```
#or
sd(c(3,6,12,89))
```

`## [1] 41.17038`

Writing functions in R is pretty easy. Let’s create one to calculate the mean of a vector by getting the sum and length. First think about how to break it down into parts:

```
x1= sum(x)
x2=length(x)
x1/x2
```

`## [1] 25.125`

Then put it all back together and create a new function called `mymean`

:

```
mymean=function(f){
sum(f)/length(f)
}
mymean(f=x)
```

`## [1] 25.125`

Confirm it works:

`mean(x)`

`## [1] 25.125`

Any potential problems with the

`mymean`

function?
`NA`

values`x3=c(5,8,NA,91,3,NA,14,30,100)`

" What do you think

`mymean(x3)`

will return?
Calculate the mean using the new function

`mymean(x3)`

`## [1] NA`

Use the built-in function (with and without na.rm=T)

`mean(x3)`

`## [1] NA`

`mean(x3,na.rm=T)`

`## [1] 35.85714`

Writing simple functions is easy, writing robust, reliable functions can be hard…

R also has standard conditional tests to generate `TRUE`

or `FALSE`

values (which also behave as `0`

s and `1`

s. These are often useful for filtering data (e.g. identify all values greater than 5). The logical operators are `<`

, `<=`

, `>`

, `>=`

, `==`

for exact equality and `!=`

for inequality.

` x`

`## [1] 5 8 14 91 3 36 14 30`

` x3 > 75`

`## [1] FALSE FALSE NA TRUE FALSE NA FALSE FALSE TRUE`

` x3 == 40`

`## [1] FALSE FALSE NA FALSE FALSE NA FALSE FALSE FALSE`

` x3 > 15`

`## [1] FALSE FALSE NA TRUE FALSE NA FALSE TRUE TRUE`

And you can perform operations on those results:

`sum(x3>15,na.rm=T)`

`## [1] 3`

or save the results as variables:

```
result = x3 > 3
result
```

`## [1] TRUE TRUE NA TRUE FALSE NA TRUE TRUE TRUE`

Define a function that counts how many values in a vector are less than or equal (`<=`

) to 12.

```
mycount=function(x){
sum(x<=12)
}
```

Try it:

`x3`

`## [1] 5 8 NA 91 3 NA 14 30 100`

`mycount(x3)`

`## [1] NA`

oops!

```
mycount=function(x){
sum(x<=12,na.rm=T)
}
```

Try it:

`x3`

`## [1] 5 8 NA 91 3 NA 14 30 100`

`mycount(x3)`

`## [1] 3`

Nice!

There are many ways to generate data in R such as sequences:

`seq(from=0, to=1, by=0.25)`

`## [1] 0.00 0.25 0.50 0.75 1.00`

and random numbers that follow a statistical distribution (such as the normal):

`a=rnorm(100,mean=0,sd=10)`

Let’s visualize those values in a histogram:

`hist(a)`

We’ll cover much more sophisticated graphics later…

You can also use matrices (2-dimensional arrays of numbers):

```
y=matrix(1:9,ncol=3)
y
```

```
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
```

Matrices behave much like vectors:

`y+2`

```
## [,1] [,2] [,3]
## [1,] 3 6 9
## [2,] 4 7 10
## [3,] 5 8 11
```

and have 2-dimensional indexing:

`y[2,3]`

`## [1] 8`

Create a 3x3 matrix full of random numbers. Hint: `rnorm(5)`

will generate 5 random numbers

`matrix(rnorm(9),nrow=3)`

```
## [,1] [,2] [,3]
## [1,] -0.6385807 -1.1242050 0.1428754
## [2,] -0.6640297 -0.4913995 0.3565074
## [3,] -0.5751638 0.1669154 0.5599687
```

Data frames are similar to matrices, but more flexible. Matrices must be all the same type (e.g. all numbers), while a data frame can include multiple data types (e.g. text, factors, numbers). Dataframes are commonly used when doing statistical modeling in R.

```
data = data.frame( x = c(11,12,14),
y = c("a","b","b"),
z = c(T,F,T))
data
```

```
## x y z
## 1 11 a TRUE
## 2 12 b FALSE
## 3 14 b TRUE
```

You can subset in several ways

`mean(data$x)`

`## [1] 12.33333`

`mean(data[["x"]])`

`## [1] 12.33333`

`mean(data[,1])`

`## [1] 12.33333`

For installed packages: `library(packagename)`

.

New packages: `install.packages()`

or use the package manager.

`library(raster)`

R may ask you to choose a CRAN mirror. CRAN is the distributed network of servers that provides access to R’s software. It doesn’t really matter which you chose, but closer ones are likely to be faster. From RStudio, you can select the mirror under Tools→Options or just wait until it asks you.

If you don’t have the packages above, install them in the package manager or by running `install.packages("raster")`

.