2 Basics
2.1 Prerequisites
We introduce the basics in R programming in this chapter. We will review the basic operators and data types in this chapter. We also provide an introduction to the basic R data structures (including tibbles and data tables).
Most of this chapter involves working with R basic operators and data types, which do not require any extra packages. We will also introduce the tibble package, which forms part of the tidyverse package in section 2.5, and the data.table package in section 2.6
2.2 Basic operators
Basic arithmetic operators (+
, -
, *
, /
, ^
, `%’) would work like your
calculator
3 + 2 # addition
## [1] 5
3 - 2 # subtraction
## [1] 1
3 * 2 # multiplication
## [1] 6
3 / 2 # division
## [1] 1.5
3^2 # exponent
## [1] 9
3 %/% 2 # integer division
## [1] 1
3 %% 2 # mod (remainder of a division)
## [1] 1
R uses the <-
operator for assignments. You can read the following code as
assigning the outcome of 2 + 3
, 5
, and 3
to the object
value_a
,value_b
, and value_c
respectively, which stores it for later use.
value_a <- 2 + 3
value_b <- 5
value_c <- 3
You can print
what is stored in the object to console with
print(value_a)
## [1] 5
You can use relational operators to compare how one object relates to another.
2 + 3 == 5 # TRUE that 2 + 3 equals 5
## [1] TRUE
2 + 3 != 5 # FALSE that 2 + 3 not equals to 5
## [1] FALSE
2 + 3 < 3 # FASLE that 2 + 3 is less than 3
## [1] FALSE
2 + 3 > 3 # TRUE that 2 + 3 is more than 3
## [1] TRUE
2 + 3 <= 5 # TRUE that 2 + 3 is less than or equal to 5
## [1] TRUE
2 + 3 >= 5 # TRUE that 2 + 3 is more than or equal to 5
## [1] TRUE
You can use logical operators to connect two or more expressions. For example, to connect the results of the comparisons made using relational operators.
(2 + 3 == 5) && (2 + 3 < 3) # logical AND operator
## [1] FALSE
(2 + 3 == 5) || (2 + 3 >= 3) # logical OR operator
## [1] TRUE
Note that the logical &&
and ||
only examines the first element of a vector.
x <- c(TRUE, TRUE, FALSE)
y <- c(FALSE, TRUE, FALSE)
x && y
## Warning in x && y: 'length(x) = 3 > 1' in coercion to 'logical(1)'
## Warning in x && y: 'length(x) = 3 > 1' in coercion to 'logical(1)'
## [1] FALSE
x || y
## Warning in x || y: 'length(x) = 3 > 1' in coercion to 'logical(1)'
## [1] TRUE
To perform element-wise logical operations, use &
and |
instead
x & y
## [1] FALSE TRUE FALSE
x | y
## [1] TRUE TRUE FALSE
!y
## [1] TRUE FALSE TRUE
2.3 Basic data types
There are basic data types (also known as atomic data types) in R in order to use them.
Data Type | Examples | Additional Information |
---|---|---|
Logical |
TRUE , FALSE
|
Boolean values |
Numeric |
1 , 999.9
|
Default data type for numbers |
Integer |
1L , 999L
|
L is used to denote an integer |
Character |
"a" , "R for BES"
|
Data type for one or more characters |
Complex | 2 + 3i |
Data type for numbers with a real and imaginary component |
Raw | charToRaw("R for BES") |
Not commonly used data type used to store raw bytes |
2.4 Basic data structures
The basic data structures in R include factors, atomic vectors, lists, matrices,
and data.frame
s
Factors are used in R to represent categorical variables. Although they appear
similar to character vectors they are actually stored as integers. You can use
the function levels()
to output the categorical variables and nlevels()
to
check the number of categorical variables.
eye_color <- factor(c("brown", "black", "green", "brown", "black", "blue"))
nlevels(eye_color)
## [1] 4
levels(eye_color)
## [1] "black" "blue" "brown" "green"
Atomic vectors or more frequently referred to as vectors are a data structure
that is used to store multiple objects of the same data type (logical, numeric,
integer, character, complex, or raw). Vectors are one-indexed (i.e., the first
element is indexed using [1]
) and you can get the number of elements in the
vector using the function length()
. The function class()
can be used to
reveal the class of any object in R.
vec_num <- c(1, 2, 3, 4)
class(vec_num)
## [1] "numeric"
vec_char <- c("R", "for", "BES")
class(vec_char)
## [1] "character"
# coercion if data types are mixed
vec_mix <- c("R", 4, "BES")
class(vec_mix)
## [1] "character"
# you can easily combine vectors using the c function
c(vec_char, vec_mix)
## [1] "R" "for" "BES" "R" "4" "BES"
# vector length
length(vec_num)
## [1] 4
# access first element of vector
vec_num[1]
## [1] 1
Lists are an ordered data structure that is used to store multiple R objects of
different types. The function list()
is used to create a list and a list in R
can be accessed using a single []
or double brackets [[]]
. Using []
returns a list of the selected element while using [[]]
returns the selected
element. Using the function length()
, you can obtain the number of objects in
a list.
my_list <- list(
c(1, 2, 3, 4),
c("a", "b"),
1L,
matrix(1:9, ncol = 3)
)
my_list_a <- my_list[1]
class(my_list_a)
## [1] "list"
my_list_b <- my_list[[1]]
class(my_list_b)
## [1] "numeric"
length(my_list)
## [1] 4
If you have named the elements in your list, you could also access them by
specifying their names in the brackets or using the $
operator.
named_list <- list(
a = c(1, 2, 3, 4),
b = c("a", "b"),
c = 1L,
d = matrix(1:9, ncol = 3)
)
class(named_list["a"])
## [1] "list"
class(named_list[["a"]])
## [1] "numeric"
class(named_list$a)
## [1] "numeric"
A matrix is a two dimensional data structure that is used to store multiple
objects. You can use the function matrix()
to create a matrix using the ncol
and nrow
argument to specify the number of columns and rows respectively, and
the byrow
argument to specify how the data in would be ordered.
matrix(1:12, ncol = 3, byrow = FALSE)
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
matrix(1:12, nrow = 3, byrow = FALSE)
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
matrix(1:12, ncol = 3, byrow = TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
Aside from numeric data types, a matrix can also be used to store other data types as long as they are homogeneous. To store heterogeneous data types, you should use a data frame which is introduced next.
matrix(c("brown", "black", "green", "brown", "black", "blue"), ncol = 2)
## [,1] [,2]
## [1,] "brown" "brown"
## [2,] "black" "black"
## [3,] "green" "blue"
matrix(c("TRUE", "TRUE", "FALSE", "FALSE"), ncol = 2)
## [,1] [,2]
## [1,] "TRUE" "FALSE"
## [2,] "TRUE" "FALSE"
matrix(c(1L, 2L, 3L, 4L), ncol = 2)
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
You can access the values in a matrix by providing the row and column index. For
example, [1, 2]
would return the value stored in the first row, second column
of the matrix. [3, 1]
would return the value stored in the third row, first of
the matrix. You can retrieve all the values in a column by leaving the row index
empty and likewise retrieve all the values in a row by leaving the column index
empty. For example, [, 2]
would return all the values in column 2 and [2, ]
would return all the values in row 2.
m <- matrix(1:12, nrow = 3, byrow = FALSE)
m[1, 2]
## [1] 4
m[3, 1]
## [1] 3
m[, 2]
## [1] 4 5 6
m[2, ]
## [1] 2 5 8 11
A data.frame
is a two-dimensional data structures that are used to store
heterogeneous data types in R. As a result of it’s convenience, data frames are
a commonly used data structure in R. You can use the function data.frame()
to
create a data frame.
df <- data.frame(
x = c(1, 2, 3),
y = c("red", "green", "blue"),
z = c(TRUE, FALSE, TRUE)
)
You can access elements of a data.frame like a list []
, [[]]
or $
. Using
[]
returns a data.frame
of the selected element while using [[]]
or $
will reduce it to a vector.
df
## x y z
## 1 1 red TRUE
## 2 2 green FALSE
## 3 3 blue TRUE
df["y"]
## y
## 1 red
## 2 green
## 3 blue
df[["y"]]
## [1] "red" "green" "blue"
df$y
## [1] "red" "green" "blue"
You can also access a data.frame
like a matrix.
df
## x y z
## 1 1 red TRUE
## 2 2 green FALSE
## 3 3 blue TRUE
df[1, 2]
## [1] "red"
df[, 2]
## [1] "red" "green" "blue"
df[2, ]
## x y z
## 2 2 green FALSE
2.5 Tibbles
Tibbles are basically a modified version of R’s data.frame
. Therefore, you
would also access tibbles like how you would access a data.frame
. You can
create a tibble using the function tibble()
. Alternatively, you can coerce a
data frame into a tibble using as_tibble()
.
tibble(
x = c(1, 2, 3),
y = c("red", "green", "blue"),
z = c(TRUE, FALSE, TRUE)
)
## # A tibble: 3 × 3
## x y z
## <dbl> <chr> <lgl>
## 1 1 red TRUE
## 2 2 green FALSE
## 3 3 blue TRUE
as_tibble(iris)
## # A tibble: 150 × 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
....
A key difference lies in how tibbles are printed. Printing a tibble only results in the first ten rows being displayed with an explicit reporting of each column’s data type.
tb <- as_tibble(iris)
print(tb)
## # A tibble: 150 × 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
....
df <- as.data.frame(iris)
print(df)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
....
Unlike a data.frame
, tibbles provides clarity on the data structure that it
returns. When indexing with tibbles, [
always return another tibble while [[
and $
alway returns a vector. In contrast, single column data frames are often
converted into atomic vectors in R unless drop = FALSE
is specified.
class(tb[, 1])
## [1] "tbl_df" "tbl" "data.frame"
class(tb[[1]])
## [1] "numeric"
class(tb$Sepal.Length)
## [1] "numeric"
class(df[, 1])
## [1] "numeric"
class(df[, 1, drop = FALSE])
## [1] "data.frame"
Additionally, tibbles do not do partial matching and raises a warning unless the variable specified is an exact match.
tb$Sepal.Lengt
## Warning: Unknown or uninitialised column: `Sepal.Lengt`.
## NULL
df$Sepal.Lengt
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
## [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
## [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
## [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
## [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
## [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9
You can read more about tibbles by typing vignette("tibble")
in your console.
2.6 data.table
Like tibbles, data.table
s are an enhanced version of data.frame
s. You can
create a data.table
using the function data.table()
. You can also coerce
existing R objects into a data.table
with setDT()
for data.frame
s and
lists, and as.data.table()
for other data structures. Note that
as.data.table()
also works with data.frame
s and lists. However setDT()
is
more memory efficient because it does not create a copy of the original data
frame or list but instead returns a data table by reference.
dt <- data.table(
x = c(1, 2, 3),
y = c("red", "green", "blue"),
z = c(TRUE, FALSE, TRUE)
)
class(
setDT(
data.frame(c(1, 2, 3))
)
)
## [1] "data.table" "data.frame"
data.table
s provide additional functionality through the way it is queried.
The general form for working with a data table is [i, j, by]
, which can be
read as subset rows using i
, operate on j
, and grouped by by
.
Lets see how this work using the iris
example dataset.
dt <- as.data.table(iris)
print(dt)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1: 5.1 3.5 1.4 0.2 setosa
## 2: 4.9 3.0 1.4 0.2 setosa
## 3: 4.7 3.2 1.3 0.2 setosa
## 4: 4.6 3.1 1.5 0.2 setosa
## 5: 5.0 3.6 1.4 0.2 setosa
## ---
## 146: 6.7 3.0 5.2 2.3 virginica
## 147: 6.3 2.5 5.0 1.9 virginica
## 148: 6.5 3.0 5.2 2.0 virginica
....
If you want to get an explicit reporting of each column’s data type, as the
tibble
does by default, you can set the data.table.print.class
to TRUE
.
options(datatable.print.class = TRUE)
print(dt)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <num> <num> <num> <num> <fctr>
## 1: 5.1 3.5 1.4 0.2 setosa
## 2: 4.9 3.0 1.4 0.2 setosa
## 3: 4.7 3.2 1.3 0.2 setosa
## 4: 4.6 3.1 1.5 0.2 setosa
## 5: 5.0 3.6 1.4 0.2 setosa
## ---
## 146: 6.7 3.0 5.2 2.3 virginica
## 147: 6.3 2.5 5.0 1.9 virginica
....
You can filter the rows to only contain Species == "virginica
.
dt[Species == "virginica"]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <num> <num> <num> <num> <fctr>
## 1: 6.3 3.3 6.0 2.5 virginica
## 2: 5.8 2.7 5.1 1.9 virginica
## 3: 7.1 3.0 5.9 2.1 virginica
## 4: 6.3 2.9 5.6 1.8 virginica
## 5: 6.5 3.0 5.8 2.2 virginica
## 6: 7.6 3.0 6.6 2.1 virginica
## 7: 4.9 2.5 4.5 1.7 virginica
## 8: 7.3 2.9 6.3 1.8 virginica
....
You can select the columns using the j
expression. Notice that wrapping the
variables within list()
or .()
ensures that a data.table
is returned. In
contrast, an atomic vector is returned when list()
or .()
is not used. .()
is an alias for list()
and therefore the two are the same.
dt[, Sepal.Length]
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
## [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
## [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
## [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
## [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
## [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9
class(dt[, Sepal.Length])
## [1] "numeric"
class(dt[, list(Sepal.Length)])
## [1] "data.table" "data.frame"
class(dt[, .(Sepal.Length)])
## [1] "data.table" "data.frame"
You can select multiple columns with list()
or .()
.
dt[, .(Sepal.Length, Species)]
## Sepal.Length Species
## <num> <fctr>
## 1: 5.1 setosa
## 2: 4.9 setosa
## 3: 4.7 setosa
## 4: 4.6 setosa
## 5: 5.0 setosa
## ---
## 146: 6.7 virginica
## 147: 6.3 virginica
....
You can also save the targeted column names in a variable and use it to specify
columns with ..
prefix.
cols <- c("Sepal.Length", "Species")
dt[, ..cols]
## Sepal.Length Species
## <num> <fctr>
## 1: 5.1 setosa
## 2: 4.9 setosa
## 3: 4.7 setosa
## 4: 4.6 setosa
## 5: 5.0 setosa
## ---
## 146: 6.7 virginica
## 147: 6.3 virginica
....
Aside from selecting columns using j
, you can carry out computations on j
involving one or more columns and a subset of rows using i
.
dt[, mean(Sepal.Length)]
## [1] 5.843333
dt[, .(
Sepal.Length.Mean = mean(Sepal.Length),
Sepal.With.Mean = mean(Sepal.Width)
)]
## Sepal.Length.Mean Sepal.With.Mean
## <num> <num>
## 1: 5.843333 3.057333
dt[
Species == "virginica" & Sepal.Length < 6,
.(
Sepal.Length.Mean = mean(Sepal.Length),
Sepal.With.Mean = mean(Sepal.Width)
)
]
## Sepal.Length.Mean Sepal.With.Mean
## <num> <num>
## 1: 5.642857 2.714286
You can then use the by
expression in data tables to perform computations by
groups.
dt[, .(
Sepal.Length.Mean = mean(Sepal.Length),
Sepal.With.Mean = mean(Sepal.Width)
),
by = Species
]
## Species Sepal.Length.Mean Sepal.With.Mean
## <fctr> <num> <num>
## 1: setosa 5.006 3.428
## 2: versicolor 5.936 2.770
## 3: virginica 6.588 2.974
The .N
variable that counts the number of instances is particularly useful
when combined with by
.
dt[, .N, by = Species]
## Species N
## <fctr> <int>
## 1: setosa 50
## 2: versicolor 50
## 3: virginica 50
You can also apply it to multiple columns using the list()
or .()
notation.
You can read the code below as calculating the mean of Speal.Length
and the
number of instances (given by .N
) grouped by their Species
and whether
Sepal.Length < 6
.
dt[, .(Sepal.Length.Mean = mean(Sepal.Length), .N),
by = .(Species, Sepal.Length < 6)
]
## Species Sepal.Length < 6 Sepal.Length.Mean N
## <fctr> <lgcl> <num> <int>
## 1: setosa TRUE 5.006000 50
## 2: versicolor FALSE 6.375000 24
## 3: versicolor TRUE 5.530769 26
## 4: virginica FALSE 6.741860 43
## 5: virginica TRUE 5.642857 7
data.table
s add, update, and delete columns by reference to avoid redundant
copies for performance improvements. You can use the :=
operator to add,
update, and delete columns in j
by reference. There are two forms for using
:=
and they are: [, LSH := RHS]
and [,
:=(LHS = RHS)]
.
dt <- as.data.table(iris)
dt[, Sepal.Sum := .(Sepal.Length + Sepal.Width)]
head(dt)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
## <num> <num> <num> <num> <fctr> <num>
## 1: 5.1 3.5 1.4 0.2 setosa 8.6
## 2: 4.9 3.0 1.4 0.2 setosa 7.9
## 3: 4.7 3.2 1.3 0.2 setosa 7.9
## 4: 4.6 3.1 1.5 0.2 setosa 7.7
## 5: 5.0 3.6 1.4 0.2 setosa 8.6
## 6: 5.4 3.9 1.7 0.4 setosa 9.3
dt[, `:=`(Petal.Sum = Petal.Length + Petal.Width)]
head(dt)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
## <num> <num> <num> <num> <fctr> <num>
## 1: 5.1 3.5 1.4 0.2 setosa 8.6
## 2: 4.9 3.0 1.4 0.2 setosa 7.9
## 3: 4.7 3.2 1.3 0.2 setosa 7.9
## 4: 4.6 3.1 1.5 0.2 setosa 7.7
## 5: 5.0 3.6 1.4 0.2 setosa 8.6
## 6: 5.4 3.9 1.7 0.4 setosa 9.3
## Petal.Sum
## <num>
....
Note that in the above code, we do not need to make any assignments back to a
variable because the modification is done by reference or in place. In other
words we are modifying dt
and not a copy of dt
. Therefore, you will also see
that if you run the entire code chunk above, dt
will contain both columns
Sepal.Sum
and Petal.Sum
.
Since :=
is used in j
, it can be combined with i
and by
as we have seen
in the earlier parts of this sub-section.
dt[Species == "versicolor" | Species == "virginica", Sepal.Length := 0]
head(dt)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
## <num> <num> <num> <num> <fctr> <num>
## 1: 5.1 3.5 1.4 0.2 setosa 8.6
## 2: 4.9 3.0 1.4 0.2 setosa 7.9
## 3: 4.7 3.2 1.3 0.2 setosa 7.9
## 4: 4.6 3.1 1.5 0.2 setosa 7.7
## 5: 5.0 3.6 1.4 0.2 setosa 8.6
## 6: 5.4 3.9 1.7 0.4 setosa 9.3
## Petal.Sum
## <num>
....
dt[, Sepal.Length.Mean := mean(Sepal.Length),
by = .(Species)
]
head(dt)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
## <num> <num> <num> <num> <fctr> <num>
## 1: 5.1 3.5 1.4 0.2 setosa 8.6
## 2: 4.9 3.0 1.4 0.2 setosa 7.9
## 3: 4.7 3.2 1.3 0.2 setosa 7.9
## 4: 4.6 3.1 1.5 0.2 setosa 7.7
## 5: 5.0 3.6 1.4 0.2 setosa 8.6
## 6: 5.4 3.9 1.7 0.4 setosa 9.3
## Petal.Sum Sepal.Length.Mean
## <num> <num>
....
You can find out more about data.table
s by typing
vignette(package = "data.table")
into the console.