2 Basics

2.1 Prerequisites

We introduce the basics in R programming in this chapter. We will review the basic operators and data types in this chapter. We also provide an introduction to the basic R data structures (including tibbles and data tables).

Most of this chapter involves working with R basic operators and data types, which do not require any extra packages. We will also introduce the tibble package, which forms part of the tidyverse package in section 2.5, and the data.table package in section 2.6

2.2 Basic operators

Basic arithmetic operators (+, -, *, /, ^, `%’) would work like your calculator

3 + 2 # addition
## [1] 5
3 - 2 # subtraction
## [1] 1
3 * 2 # multiplication
## [1] 6
3 / 2 # division
## [1] 1.5
3^2 # exponent
## [1] 9
3 %/% 2 # integer division
## [1] 1
3 %% 2 # mod (remainder of a division)
## [1] 1

R uses the <- operator for assignments. You can read the following code as assigning the outcome of 2 + 3, 5, and 3 to the object value_a,value_b, and value_c respectively, which stores it for later use.

value_a <- 2 + 3
value_b <- 5
value_c <- 3

You can print what is stored in the object to console with

print(value_a)
## [1] 5

You can use relational operators to compare how one object relates to another.

2 + 3 == 5 # TRUE that 2 + 3 equals 5
## [1] TRUE
2 + 3 != 5 # FALSE that 2 + 3 not equals to 5
## [1] FALSE
2 + 3 < 3 # FASLE that 2 + 3 is less than 3
## [1] FALSE
2 + 3 > 3 # TRUE that 2 + 3 is more than 3
## [1] TRUE
2 + 3 <= 5 # TRUE that 2 + 3 is less than or equal to 5
## [1] TRUE
2 + 3 >= 5 # TRUE that 2 + 3 is more than or equal to 5
## [1] TRUE

You can use logical operators to connect two or more expressions. For example, to connect the results of the comparisons made using relational operators.

(2 + 3 == 5) && (2 + 3 < 3) # logical AND operator
## [1] FALSE
(2 + 3 == 5) || (2 + 3 >= 3) # logical OR operator
## [1] TRUE

Note that the logical && and || only examines the first element of a vector.

x <- c(TRUE, TRUE, FALSE)
y <- c(FALSE, TRUE, FALSE)
x && y
## Warning in x && y: 'length(x) = 3 > 1' in coercion to 'logical(1)'

## Warning in x && y: 'length(x) = 3 > 1' in coercion to 'logical(1)'
## [1] FALSE
x || y
## Warning in x || y: 'length(x) = 3 > 1' in coercion to 'logical(1)'
## [1] TRUE

To perform element-wise logical operations, use & and | instead

x & y
## [1] FALSE  TRUE FALSE
x | y
## [1]  TRUE  TRUE FALSE
!y
## [1]  TRUE FALSE  TRUE

2.3 Basic data types

There are basic data types (also known as atomic data types) in R in order to use them.

Data Type Examples Additional Information
Logical TRUE, FALSE Boolean values
Numeric 1, 999.9 Default data type for numbers
Integer 1L, 999L L is used to denote an integer
Character "a", "R for BES" Data type for one or more characters
Complex 2 + 3i Data type for numbers with a real and imaginary component
Raw charToRaw("R for BES") Not commonly used data type used to store raw bytes

2.4 Basic data structures

The basic data structures in R include factors, atomic vectors, lists, matrices, and data.frames

Factors are used in R to represent categorical variables. Although they appear similar to character vectors they are actually stored as integers. You can use the function levels() to output the categorical variables and nlevels() to check the number of categorical variables.

eye_color <- factor(c("brown", "black", "green", "brown", "black", "blue"))
nlevels(eye_color)
## [1] 4
levels(eye_color)
## [1] "black" "blue"  "brown" "green"

Atomic vectors or more frequently referred to as vectors are a data structure that is used to store multiple objects of the same data type (logical, numeric, integer, character, complex, or raw). Vectors are one-indexed (i.e., the first element is indexed using [1]) and you can get the number of elements in the vector using the function length(). The function class() can be used to reveal the class of any object in R.

vec_num <- c(1, 2, 3, 4)
class(vec_num)
## [1] "numeric"

vec_char <- c("R", "for", "BES")
class(vec_char)
## [1] "character"

# coercion if data types are mixed
vec_mix <- c("R", 4, "BES")
class(vec_mix)
## [1] "character"

# you can easily combine vectors using the c function
c(vec_char, vec_mix)
## [1] "R"   "for" "BES" "R"   "4"   "BES"

# vector length
length(vec_num)
## [1] 4

# access first element of vector
vec_num[1]
## [1] 1

Lists are an ordered data structure that is used to store multiple R objects of different types. The function list() is used to create a list and a list in R can be accessed using a single [] or double brackets [[]]. Using [] returns a list of the selected element while using [[]] returns the selected element. Using the function length(), you can obtain the number of objects in a list.

my_list <- list(
    c(1, 2, 3, 4),
    c("a", "b"),
    1L,
    matrix(1:9, ncol = 3)
)

my_list_a <- my_list[1]
class(my_list_a)
## [1] "list"

my_list_b <- my_list[[1]]
class(my_list_b)
## [1] "numeric"

length(my_list)
## [1] 4

If you have named the elements in your list, you could also access them by specifying their names in the brackets or using the $ operator.

named_list <- list(
    a = c(1, 2, 3, 4),
    b = c("a", "b"),
    c = 1L,
    d = matrix(1:9, ncol = 3)
)

class(named_list["a"])
## [1] "list"

class(named_list[["a"]])
## [1] "numeric"

class(named_list$a)
## [1] "numeric"

A matrix is a two dimensional data structure that is used to store multiple objects. You can use the function matrix() to create a matrix using the ncol and nrow argument to specify the number of columns and rows respectively, and the byrow argument to specify how the data in would be ordered.

matrix(1:12, ncol = 3, byrow = FALSE)
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

matrix(1:12, nrow = 3, byrow = FALSE)
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

matrix(1:12, ncol = 3, byrow = TRUE)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12

Aside from numeric data types, a matrix can also be used to store other data types as long as they are homogeneous. To store heterogeneous data types, you should use a data frame which is introduced next.

matrix(c("brown", "black", "green", "brown", "black", "blue"), ncol = 2)
##      [,1]    [,2]   
## [1,] "brown" "brown"
## [2,] "black" "black"
## [3,] "green" "blue"

matrix(c("TRUE", "TRUE", "FALSE", "FALSE"), ncol = 2)
##      [,1]   [,2]   
## [1,] "TRUE" "FALSE"
## [2,] "TRUE" "FALSE"

matrix(c(1L, 2L, 3L, 4L), ncol = 2)
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

You can access the values in a matrix by providing the row and column index. For example, [1, 2] would return the value stored in the first row, second column of the matrix. [3, 1] would return the value stored in the third row, first of the matrix. You can retrieve all the values in a column by leaving the row index empty and likewise retrieve all the values in a row by leaving the column index empty. For example, [, 2] would return all the values in column 2 and [2, ] would return all the values in row 2.

m <- matrix(1:12, nrow = 3, byrow = FALSE)

m[1, 2]
## [1] 4

m[3, 1]
## [1] 3

m[, 2]
## [1] 4 5 6

m[2, ]
## [1]  2  5  8 11

A data.frame is a two-dimensional data structures that are used to store heterogeneous data types in R. As a result of it’s convenience, data frames are a commonly used data structure in R. You can use the function data.frame() to create a data frame.

df <- data.frame(
    x = c(1, 2, 3),
    y = c("red", "green", "blue"),
    z = c(TRUE, FALSE, TRUE)
)

You can access elements of a data.frame like a list [], [[]] or $. Using [] returns a data.frame of the selected element while using [[]] or $ will reduce it to a vector.

df
##   x     y     z
## 1 1   red  TRUE
## 2 2 green FALSE
## 3 3  blue  TRUE

df["y"]
##       y
## 1   red
## 2 green
## 3  blue

df[["y"]]
## [1] "red"   "green" "blue"

df$y
## [1] "red"   "green" "blue"

You can also access a data.frame like a matrix.

df
##   x     y     z
## 1 1   red  TRUE
## 2 2 green FALSE
## 3 3  blue  TRUE

df[1, 2]
## [1] "red"

df[, 2]
## [1] "red"   "green" "blue"

df[2, ]
##   x     y     z
## 2 2 green FALSE

2.5 Tibbles

Tibbles are basically a modified version of R’s data.frame. Therefore, you would also access tibbles like how you would access a data.frame. You can create a tibble using the function tibble(). Alternatively, you can coerce a data frame into a tibble using as_tibble().

tibble(
    x = c(1, 2, 3),
    y = c("red", "green", "blue"),
    z = c(TRUE, FALSE, TRUE)
)
## # A tibble: 3 × 3
##       x y     z    
##   <dbl> <chr> <lgl>
## 1     1 red   TRUE 
## 2     2 green FALSE
## 3     3 blue  TRUE

as_tibble(iris)
## # A tibble: 150 × 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
....

A key difference lies in how tibbles are printed. Printing a tibble only results in the first ten rows being displayed with an explicit reporting of each column’s data type.

tb <- as_tibble(iris)
print(tb)
## # A tibble: 150 × 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
....

df <- as.data.frame(iris)
print(df)
##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
....

Unlike a data.frame, tibbles provides clarity on the data structure that it returns. When indexing with tibbles, [ always return another tibble while [[ and $ alway returns a vector. In contrast, single column data frames are often converted into atomic vectors in R unless drop = FALSE is specified.

class(tb[, 1])
## [1] "tbl_df"     "tbl"        "data.frame"

class(tb[[1]])
## [1] "numeric"

class(tb$Sepal.Length)
## [1] "numeric"

class(df[, 1])
## [1] "numeric"

class(df[, 1, drop = FALSE])
## [1] "data.frame"

Additionally, tibbles do not do partial matching and raises a warning unless the variable specified is an exact match.

tb$Sepal.Lengt
## Warning: Unknown or uninitialised column: `Sepal.Lengt`.
## NULL

df$Sepal.Lengt
##   [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
##  [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
##  [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
##  [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
##  [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
##  [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9

You can read more about tibbles by typing vignette("tibble") in your console.

2.6 data.table

Like tibbles, data.tables are an enhanced version of data.frames. You can create a data.table using the function data.table(). You can also coerce existing R objects into a data.table with setDT() for data.frames and lists, and as.data.table() for other data structures. Note that as.data.table() also works with data.frames and lists. However setDT() is more memory efficient because it does not create a copy of the original data frame or list but instead returns a data table by reference.

dt <- data.table(
    x = c(1, 2, 3),
    y = c("red", "green", "blue"),
    z = c(TRUE, FALSE, TRUE)
)

class(
    setDT(
        data.frame(c(1, 2, 3))
    )
)
## [1] "data.table" "data.frame"

data.tables provide additional functionality through the way it is queried. The general form for working with a data table is [i, j, by], which can be read as subset rows using i, operate on j, and grouped by by.

Lets see how this work using the iris example dataset.

dt <- as.data.table(iris)

print(dt)
##      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
##   1:          5.1         3.5          1.4         0.2    setosa
##   2:          4.9         3.0          1.4         0.2    setosa
##   3:          4.7         3.2          1.3         0.2    setosa
##   4:          4.6         3.1          1.5         0.2    setosa
##   5:          5.0         3.6          1.4         0.2    setosa
##  ---                                                            
## 146:          6.7         3.0          5.2         2.3 virginica
## 147:          6.3         2.5          5.0         1.9 virginica
## 148:          6.5         3.0          5.2         2.0 virginica
....

If you want to get an explicit reporting of each column’s data type, as the tibble does by default, you can set the data.table.print.class to TRUE.

options(datatable.print.class = TRUE)

print(dt)
##      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
##             <num>       <num>        <num>       <num>    <fctr>
##   1:          5.1         3.5          1.4         0.2    setosa
##   2:          4.9         3.0          1.4         0.2    setosa
##   3:          4.7         3.2          1.3         0.2    setosa
##   4:          4.6         3.1          1.5         0.2    setosa
##   5:          5.0         3.6          1.4         0.2    setosa
##  ---                                                            
## 146:          6.7         3.0          5.2         2.3 virginica
## 147:          6.3         2.5          5.0         1.9 virginica
....

You can filter the rows to only contain Species == "virginica.

dt[Species == "virginica"]
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
##            <num>       <num>        <num>       <num>    <fctr>
##  1:          6.3         3.3          6.0         2.5 virginica
##  2:          5.8         2.7          5.1         1.9 virginica
##  3:          7.1         3.0          5.9         2.1 virginica
##  4:          6.3         2.9          5.6         1.8 virginica
##  5:          6.5         3.0          5.8         2.2 virginica
##  6:          7.6         3.0          6.6         2.1 virginica
##  7:          4.9         2.5          4.5         1.7 virginica
##  8:          7.3         2.9          6.3         1.8 virginica
....

You can select the columns using the j expression. Notice that wrapping the variables within list() or .() ensures that a data.table is returned. In contrast, an atomic vector is returned when list() or .() is not used. .() is an alias for list() and therefore the two are the same.

dt[, Sepal.Length]
##   [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
##  [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
##  [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
##  [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
##  [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
##  [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9

class(dt[, Sepal.Length])
## [1] "numeric"

class(dt[, list(Sepal.Length)])
## [1] "data.table" "data.frame"

class(dt[, .(Sepal.Length)])
## [1] "data.table" "data.frame"

You can select multiple columns with list() or .().


dt[, .(Sepal.Length, Species)]
##      Sepal.Length   Species
##             <num>    <fctr>
##   1:          5.1    setosa
##   2:          4.9    setosa
##   3:          4.7    setosa
##   4:          4.6    setosa
##   5:          5.0    setosa
##  ---                       
## 146:          6.7 virginica
## 147:          6.3 virginica
....

You can also save the targeted column names in a variable and use it to specify columns with .. prefix.

cols <- c("Sepal.Length", "Species")

dt[, ..cols]
##      Sepal.Length   Species
##             <num>    <fctr>
##   1:          5.1    setosa
##   2:          4.9    setosa
##   3:          4.7    setosa
##   4:          4.6    setosa
##   5:          5.0    setosa
##  ---                       
## 146:          6.7 virginica
## 147:          6.3 virginica
....

Aside from selecting columns using j, you can carry out computations on j involving one or more columns and a subset of rows using i.


dt[, mean(Sepal.Length)]
## [1] 5.843333

dt[, .(
    Sepal.Length.Mean = mean(Sepal.Length),
    Sepal.With.Mean = mean(Sepal.Width)
)]
##    Sepal.Length.Mean Sepal.With.Mean
##                <num>           <num>
## 1:          5.843333        3.057333

dt[
    Species == "virginica" & Sepal.Length < 6,
    .(
        Sepal.Length.Mean = mean(Sepal.Length),
        Sepal.With.Mean = mean(Sepal.Width)
    )
]
##    Sepal.Length.Mean Sepal.With.Mean
##                <num>           <num>
## 1:          5.642857        2.714286

You can then use the by expression in data tables to perform computations by groups.


dt[, .(
    Sepal.Length.Mean = mean(Sepal.Length),
    Sepal.With.Mean = mean(Sepal.Width)
),
by = Species
]
##       Species Sepal.Length.Mean Sepal.With.Mean
##        <fctr>             <num>           <num>
## 1:     setosa             5.006           3.428
## 2: versicolor             5.936           2.770
## 3:  virginica             6.588           2.974

The .N variable that counts the number of instances is particularly useful when combined with by.


dt[, .N, by = Species]
##       Species     N
##        <fctr> <int>
## 1:     setosa    50
## 2: versicolor    50
## 3:  virginica    50

You can also apply it to multiple columns using the list() or .() notation. You can read the code below as calculating the mean of Speal.Length and the number of instances (given by .N) grouped by their Species and whether Sepal.Length < 6.

dt[, .(Sepal.Length.Mean = mean(Sepal.Length), .N),
    by = .(Species, Sepal.Length < 6)
]
##       Species Sepal.Length < 6 Sepal.Length.Mean     N
##        <fctr>           <lgcl>             <num> <int>
## 1:     setosa             TRUE          5.006000    50
## 2: versicolor            FALSE          6.375000    24
## 3: versicolor             TRUE          5.530769    26
## 4:  virginica            FALSE          6.741860    43
## 5:  virginica             TRUE          5.642857     7

data.tables add, update, and delete columns by reference to avoid redundant copies for performance improvements. You can use the := operator to add, update, and delete columns in j by reference. There are two forms for using := and they are: [, LSH := RHS] and [,:=(LHS = RHS)].

dt <- as.data.table(iris)

dt[, Sepal.Sum := .(Sepal.Length + Sepal.Width)]
head(dt)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
##           <num>       <num>        <num>       <num>  <fctr>     <num>
## 1:          5.1         3.5          1.4         0.2  setosa       8.6
## 2:          4.9         3.0          1.4         0.2  setosa       7.9
## 3:          4.7         3.2          1.3         0.2  setosa       7.9
## 4:          4.6         3.1          1.5         0.2  setosa       7.7
## 5:          5.0         3.6          1.4         0.2  setosa       8.6
## 6:          5.4         3.9          1.7         0.4  setosa       9.3

dt[, `:=`(Petal.Sum = Petal.Length + Petal.Width)]
head(dt)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
##           <num>       <num>        <num>       <num>  <fctr>     <num>
## 1:          5.1         3.5          1.4         0.2  setosa       8.6
## 2:          4.9         3.0          1.4         0.2  setosa       7.9
## 3:          4.7         3.2          1.3         0.2  setosa       7.9
## 4:          4.6         3.1          1.5         0.2  setosa       7.7
## 5:          5.0         3.6          1.4         0.2  setosa       8.6
## 6:          5.4         3.9          1.7         0.4  setosa       9.3
##    Petal.Sum
##        <num>
....

Note that in the above code, we do not need to make any assignments back to a variable because the modification is done by reference or in place. In other words we are modifying dt and not a copy of dt. Therefore, you will also see that if you run the entire code chunk above, dt will contain both columns Sepal.Sum and Petal.Sum.

Since := is used in j, it can be combined with i and by as we have seen in the earlier parts of this sub-section.


dt[Species == "versicolor" | Species == "virginica", Sepal.Length := 0]
head(dt)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
##           <num>       <num>        <num>       <num>  <fctr>     <num>
## 1:          5.1         3.5          1.4         0.2  setosa       8.6
## 2:          4.9         3.0          1.4         0.2  setosa       7.9
## 3:          4.7         3.2          1.3         0.2  setosa       7.9
## 4:          4.6         3.1          1.5         0.2  setosa       7.7
## 5:          5.0         3.6          1.4         0.2  setosa       8.6
## 6:          5.4         3.9          1.7         0.4  setosa       9.3
##    Petal.Sum
##        <num>
....

dt[, Sepal.Length.Mean := mean(Sepal.Length),
    by = .(Species)
]
head(dt)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
##           <num>       <num>        <num>       <num>  <fctr>     <num>
## 1:          5.1         3.5          1.4         0.2  setosa       8.6
## 2:          4.9         3.0          1.4         0.2  setosa       7.9
## 3:          4.7         3.2          1.3         0.2  setosa       7.9
## 4:          4.6         3.1          1.5         0.2  setosa       7.7
## 5:          5.0         3.6          1.4         0.2  setosa       8.6
## 6:          5.4         3.9          1.7         0.4  setosa       9.3
##    Petal.Sum Sepal.Length.Mean
##        <num>             <num>
....

You can find out more about data.tables by typing vignette(package = "data.table") into the console.