Create data in R

Sequences

It is easy to create patterned data in R. For example, a sequence of numbers from 1 to 10:

1:10 
 [1]  1  2  3  4  5  6  7  8  9 10

More complicated sequences can be created using a combination of the colon : and other operators, but the function seq() provides a more versatile way of creating patterned data in a single step

Look at the R help file on this function:

?seq

Under usage, you can see the parameters1 that R expects for this function

A simple example of a sequence starting at 0 (from parameter), ending at 25 (to parameter), incrementing by 5 at each step (by parameter):

seq(0, 25, 5) 
[1]  0  5 10 15 20 25

Sites

Here we use an underscore _ to create more complex object names, as recommended by the tidyverse style guide. The style guide recommends using only lowercase letters and numbers for object names, with underscores _ to separate words within a name

sites_main <- "gabon"
sites_secondary <- c("drc", "congo-brazzaville", "equatorial guinea", "niger")
sites_all <- c(sites_main, sites_secondary)
sites_all
[1] "gabon"             "drc"               "congo-brazzaville"
[4] "equatorial guinea" "niger"            

Mixing data types

Now see what happens if you add numeric values on the end of your vector of site names:

sites_num <- c(sites_all, 3, 5)
sites_num
[1] "gabon"             "drc"               "congo-brazzaville"
[4] "equatorial guinea" "niger"             "3"                
[7] "5"                

The formatting of the numbers in sites_num and in v differs. R cannot handle two different data types (numeric and text) in a single vector, so it treats the numbers as text, enclosing them in double quotes

Trying to perform a mathematical operation on sites_num makes this clearer. For example: sites_num * 2 give you an error message explaining that you attempted to pass a non-numeric value to a mathematical operator (in this case, multiplication *)

sites_num * 2

Add a comment to your code - remember R doesn’t process any text after the #

sites_num * 2 # Multiplying no longer works because sites_num now contains text

Generate example data

More complex datasets can be created by joining, or binding, vectors together

For example, you might expect to see different home range sizes in male and female rodents. You can create two vectors, one for each sex, and then concatenate them into a single vector

Here we generate home range data from normal distributions with different means and standard deviations:

male_home_ranges <- rnorm(15, 10.3, 2.6)
female_home_ranges <- rnorm(15, 6.4, 3.8)
home_ranges <- c(male_home_ranges, female_home_ranges)
home_ranges
 [1]  9.7282333  6.7469984 12.6267899 10.7824351 13.4609852  3.9309954
 [7]  9.3725726 10.9279155 10.5409802 10.7114054 11.4166757 11.7136811
[13] 11.0829805 11.4917988 10.4810992  3.8741758 10.2654706  3.7194361
[19]  3.9212347  4.1974820 10.7515616  4.0115849  6.3316017 12.4498489
[25]  8.2080928  4.1013051  8.1846088 -0.1289359  4.4171378  1.3896540

Let’s create a vector that codes the sex of our individual rodents. What is the rep() function doing?

individual <- 1:30
sex <- c(rep("M", 15), rep("F", 15))

Combining data

Bind columns together

Now that you have two objects representing a column of sexes and a column of home range sizes. Let’s bind them together into a single object using the cbind() function to do this:

all_data <- cbind(individual, sex, home_ranges)
head(all_data)
     individual sex home_ranges       
[1,] "1"        "M" "9.72823332282816"
[2,] "2"        "M" "6.74699844969828"
[3,] "3"        "M" "12.6267899256032"
[4,] "4"        "M" "10.7824350820041"
[5,] "5"        "M" "13.4609851781335"
[6,] "6"        "M" "3.93099536312963"

Bind rows together

The rbind() function works for rows

For example, you may have radio-tracking data from several individuals stored in different files, with identical format. You could use rbind() to combine them into a single dataset