# This is a chunk of R code. All text after a # symbol is a comment
# Set working directory using setwd() function
setwd('Enter the path to my working directory')
# Clear all variables in R's memory
rm(list=ls()) # Standard code to clear R's memory
Starting to Use R
Website Version
This tutorial is a mixture of R code chunks and explanations of the code. The R code chunks will appear in boxes.
Below is an example of a chunk of R code:
Sometimes the output from running this R code will be displayed after the chunk of code.
Here is a chunk of code followed by the R output
2 + 4 # Use R to add two numbers
[1] 6
Objectives
The objectives of this tutorial are:
- Provide an overview of R’s basic functions
- Describe the types of data R understands
- Introduce arrays of numbers
The three foundations
There are three types of instructions that you will be giving R:
Command type | Description | Terminology | Example |
---|---|---|---|
Doing | These instruct R to do something | All Functions and Operators | Addition, 2+2 |
Naming | These tell R to assign a name to something | <- or = |
four = 2+2 |
Controlling | These control when R executes a command | For-loops and If-statements |
We will primarily be using the doing and naming commands.
Doing Commands (Functions and Operators)
Functions
A function has a name followed by parentheses. When R recognises a function it does the calculation associated with the function. Here are some examples of functions:
Function name | What does the function do | Example |
---|---|---|
c() |
Group together some numbers | c(1,2,3,4,5) |
mean() |
Calculate the mean of a group of numbers | mean(c(1,2,3,4,5)) |
sum() |
Adds up a group of numbers | sum(c(1,2,3,4,5)) |
max() |
Find the maximum value from a group of numbers | max(c(1,2,3,4,5)) |
log10() |
Calculate the logarithm (base 10) of a number | log10(100) |
citation() |
Display how to cite R in a report | citation() |
Arguments
R’s help pages show all the arguments you can give to a function. E.g. type ?mean
to see arguments for the function mean()
Some functions need information to be able to do their calculation. For example the function mean()
needs a group of numbers before it can calculate the mean. This information is given inside the parentheses. Each item of information is all an argument.
Operators
Operators do simple calculations, such as arithmetic.
Arithmetic operators
R is really good at doing arithmetic.
Have a go at typing in these calculations at the command prompt.
Remember the command prompt is the >
in the console window
2+3 # 2 plus 3 (addition operator)
2-3 # 2 minus 3 (subtraction operator)
2*3 # 2 multiplied by 3 (multiplication operator)
2/3 # 2 divided by 3 (division operator)
2^3 # 2 to the power of 3 (power operator)
Logical operators
R can tell you whether statements are TRUE
or FALSE
. These are called **logical statements*.
Here is an example of typing some logical statements at the command prompt (these use the “less than”, <
, and “greater than”, >
, symbols)
Operator Symbol | Definition |
---|---|
== |
equals to (note the double equals sign) |
!= |
not equals to |
> |
greater than |
>= |
greater than or equal to |
< |
less than |
<= |
less than or equal to |
# ****************************************************
# Examples of relational calculations -------------------
2 == 3 # Is 2 equal to 3?
[1] FALSE
2 < 3 # Is 2 less than 3?
[1] TRUE
Here are some more logical statements for you to try
2 != 3 # Is 2 not equal to 3?
'A' == 'B' # Is 'A' equal to 'B'?
'A' == 'a' # Is 'A' equal to 'a'?
Have a go
Here are some examples of mathematical functions and operators to try…
# ****************************************************
# Mathematical functions -----------------------------
cos(2/3) # cosine of 2/3
exp(2) # exponential function of 2
log10(2) # logarithm (base 10) of 2
log(2) # logarithm (base e) of 2
sqrt(2) # square-root of 2
2^0.5 # square-root of 2
3^2 # 3 to the power of 2
round(5/3, digits=3) # round 5/3 to 3 decimal places
signif(5/3, digits=3) # round 5/3 to 3 significant figures
floor(5/3) # round 5/3 to the largest integer less than 5/3
abs(-1.4) # absolute value, ignore the minus sign
Naming Commands (Assigning)
In R you can give anything a name by using the assignment operator <-
Giving a name is called assignment
Here is an example where the result of the calculation 2 + 3
is given the name a
(you can use almost any name you want)
<- 2 + 3 # Assign the output the name 'a'
a # Display the value of 'a' a
[1] 5
Once a name is assigned it can be used in place of the result. For example,
/ 10 # Use the name 'a' in a calculation a
[1] 0.5
Names can also be assigned using =
. The =
and <-
assignment operators are almost identical in how they work.
Grouping numbers together (arrays)
We will often want to group numbers together (e.g. a group of observations).
A collection of several numbers is called an array. An array allows one name to be assigned to a group of numbers.
For example, the whole numbers from 1 to 10 could be combined together as an array
c(1,2,3,4,5,6,7,8,9,10) # An array of numbers
[1] 1 2 3 4 5 6 7 8 9 10
You can find more information on working with arrays at http://DrJonYearsley.github.io/Resources/Manipulate_Data_WebVersion.html
Creating an array
An array can be created using the c()
function that combines several numbers (this is known as the combine function)
<- c(1,2,3,5,7) # An array of the first 5 prime numbers
b # Display the value of variable 'b' b
[1] 1 2 3 5 7
An array of whole numbers can be quickly created using a colon (:)
c(5:15) # An array of whole numbers from 5 to 15
[1] 5 6 7 8 9 10 11 12 13 14 15
Data types in R
Data can be broadly divided into two types (quantitative and qualitative). Within these two types R makes finer distinctions.
The main data types in R are called:
- Quantitative data:
- Real numbers:
numeric
(num
) - Whole numbers (integers):
integer
(int
)
- Qualitative data:
- Text (characters & strings):
character
(chr
) - Logicals (booleans):
logical
(logi
) - Factors (categorical):
factor
Quantitative Data Types
A typical number (a real number) is given the data type num
(num standard for numerical). A real number is quantitative data.
You can see the data type of a variable using the str()
function (this function displays the structure of the variable).
<- 2.45 # Numerical (R's data type='num')
x1 str(x1)
num 2.45
A whole number (an integer) can be given the data type num
, or it can be explicitly distinguished as a whole number using the data type int
(int stands for integer). A whole number is quantitative data.
You can see the data type of a variable using the str()
function (this function displays the structure of the variable).
<- 5 # Numerical whole number (R's data type='num')
x2 str(x2)
num 5
<- as.integer(5) # Numerical whole number (R's data type='int')
x3 str(x3)
int 5
Qualitative Data Types
Text (either a single letter/number or a series of letters/numbers) is given the data type chr
(chr standard for character). A character is qualitative data (see the section labelled ‘Factor’).
You can see the data type of a variable using the str()
function (this function displays the structure of the variable).
<- 's' # Character (R's data type='chr')
x4 str(x4)
chr "s"
<- 'hello' # Character string (R's data type='chr')
x5 str(x5)
chr "hello"
Logical variables (i.e. variables that can only take the values TRUE
or FALSE
) are given the data type logi
(logi standard for logical).
You can see the data type of a variable using the str()
function (this function displays the structure of the variable).
<- TRUE # Logical (TRUE/FALSE) (R's data type='logi')
x6 str(x6)
logi TRUE
<- NA # A missing value (R explicitly recognises missing data)
x7 str(x7)
logi NA
A factor is a qualitative variable that forms a list of names. Qualitative variables are given the data type Factor
.
You can see the data type of a variable using the str()
function (this function displays the structure of the variable).
# An array of place names as a factor
<- as.factor(c('Dublin','Cork','Galway'))
x8 str(x8)
Factor w/ 3 levels "Cork","Dublin",..: 2 1 3
Factors will be important when we start to analyse data because there is an important distinction between quantitative data and qualitative data.
Missing data, NA
Missing data are data points points that could not be recorded for some reason. Missing data is an important type of data that R explicitly recognizes. R uses the value NA
to represent missing data. Missing data should not be set to a value (e.g. 0) because this can be misinterpreted as being the value zero!
Missing data should be included in data sets and explicitly represented. For example, if we had failed to record the number 5 in a data set of whole numbers it would be represented as
# Explicitly record missing data as NA
<- c(1,2,3,4,NA,6,7) k
R must account for missing data when performing calculations. R will not give an answer if a calculation contains missing data.
This means you must tell R to remove missing data before performing a calculation. Many of R’s statistical functions have an argument na.rm=TRUE
which tells R to remove missing values before performing a calculation.
If a calculation produces the answer NA
then try using na.rm=TRUE
mean(k, na.rm=T) # Calculate mean after removing missing data
[1] 3.833333
Logical values
A logical value can be either TRUE
or FALSE
.
Single logical expressions
Here are some examples of logical expressions (using some of the operators from above) and their logical output
2.5 > 1 # Is 2.5 greater than 1?
[1] TRUE
-1 <= 3 # Is -1 less than or equal to 3?
[1] TRUE
Some more for you to try…
5 == 2 # Is 5 equal to 2 (NOTE: logical equals is ==)?
5 != 2 # Is 5 not equal to 2?
Logical calculations can be performed on an array of numbers. In this example we use the array called b
from above,
!= 5 # Is each element of b not equal to 5 b
[1] TRUE TRUE TRUE FALSE TRUE
Combining logical expressions
Logical expressions can be combined. There are three basic operations for this (Table 2):
Symbol | Definition |
---|---|
! | logical NOT |
& | logical AND |
| | logical OR |
An example of an AND statement
# Combine two logical expressions using & (logical AND)
!=5) & (b>2) # b not equal to 5 AND b greater than 2 (b
[1] FALSE FALSE TRUE FALSE TRUE
An example of an OR statement
# Combine two logical expressions using | (logical OR)
!=5) | (b>2) # b not equal to 5 OR b greater than 2 (b
[1] TRUE TRUE TRUE TRUE TRUE
The continuation prompt, +
If we send only part of a command to R it will recognize that the command is incomplete by displaying the continuation prompt (the symbol +
). This prompt means R is waiting for the rest of the command.
To remove a continuation prompt and return to the command prompt, press the Esc key
Large and small numbers (exponents)
R uses an exponent notation to display very large and very small numbers.
A 70kg human has roughly 30 trillon cells in their body (that is 3 x 1013 in scientific notation). That’s a large number. R replaces the x 10+13 with e+13
# Number of cells in a 70 kg human
30000000000000
[1] 3e+13
The average weight of a human cell is 70 kg divided by 30 trillion. That’s a very small number. Very small numbers are also represented using the same notation (the e
in e-12
stands for exponent).
In a scientific report 2.3e-12
should be written in scientific notation as 2.3 x 10-12.
# Average weight of a human cell (kg)
70 / 3e13
[1] 2.333333e-12
Summary of topics
- Doing commands (functions and operators)
- Assigning names (assignment)
- Arrays
- R’s data types
- Missing data
- Logical Variables
- Exponential notation for large and small numbers
Further Reading
All these books can be found in UCD’s library
- Andrew P. Beckerman and Owen L. Petchey, 2012 Getting Started with R: An introduction for biologists (Oxford University Press, Oxford) [Chapter 1, 2]
- Mark Gardner, 2012 Statistics for Ecologists Using R and Excel (Pelagic, Exeter) [Chapter 3]
- Michael J. Crawley, 2015 Statistics : an introduction using R (John Wiley & Sons, Chichester) [Appendix]
- Tenko Raykov and George A Marcoulides, 2013 Basic statistics: an introduction with R (Rowman and Littlefield, Plymouth)
- John Verzani, 2005 Using R for introductory statistics (Chapman and Hall, London) [Chapter 1]