Package 'tidyft' reference manual

Title:	Fast and Memory Efficient Data Operations in Tidy Syntax
Description:	Tidy syntax for 'data.table', using modification by reference whenever possible. This toolkit is designed for big data analysis in high-performance desktop or laptop computers. The syntax of the package is similar or identical to 'tidyverse'. It is user friendly, memory efficient and time saving. For more information, check its ancestor package 'tidyfst'.
Authors:	Tian-Yuan Huang [aut, cre]
Maintainer:	Tian-Yuan Huang <[email protected]>
License:	MIT + file LICENSE
Version:	0.9.20
Built:	2025-01-21 04:41:03 UTC
Source:	https://github.com/hope-data-science/tidyft

Arrange entries in data.frame

Description

Analogous function for arrange in dplyr.

Usage

arrange(.data, ..., cols = NULL, order = 1L)
arrange(.data, ..., cols = NULL, order = 1L)

Arguments

`.data`	data.frame
`...`	Arrange by what group? Minus symbol means arrange by descending order.
`cols`	For `set_arrange` only. A character vector of column names of `.data` by which to order. If present, override `...`. Defaults to `NULL`.
`order`	For `set_arrange` only. An integer vector with only possible values of 1 and -1, corresponding to ascending and descending order. Defaults to 1.

Details

Once arranged, the order of entries would be changed forever.

Value

A data.table

Examples


a = as.data.table(iris)
a %>% arrange(Sepal.Length)
a
a %>% arrange(cols = c("Sepal.Width","Petal.Length"))
a

a = as.data.table(iris)
a %>% arrange(Sepal.Length)
a
a %>% arrange(cols = c("Sepal.Width","Petal.Length"))
a

Save a data.frame as a fst table

Description

This function first export the data.frame to a temporal file, and then parse it back as a fst table (class name is "fst_table").

Usage

as_fst(.data)
as_fst(.data)

Arguments

.data

A data.frame

Value

An object of class fst_table

Examples



  iris %>%
    as_fst() -> iris_fst
  iris_fst

iris %>%
    as_fst() -> iris_fst
  iris_fst

Complete a data frame with missing combinations of data

Description

Turns implicit missing values into explicit missing values. Analogous function for complete function in tidyr.

Usage

complete(.data, ..., fill = NA)
complete(.data, ..., fill = NA)

Arguments

`.data`	data.frame
`...`	Specification of columns to expand.The selection of columns is supported by the flexible `select_dt`. To find all unique combinations of provided columns, including those not found in the data, supply each variable as a separate argument. But the two modes (select the needed columns and fill outside values) could not be mixed, find more details in examples.
`fill`	Atomic value to fill into the missing cell, default uses `NA`.

Details

When the provided columns with addtion data are of different length, all the unique combinations would be returned. This operation should be used only on unique entries, and it will always returned the unique entries.

If you supply fill parameter, these values will also replace existing explicit missing values in the data set.

Value

data.table

Examples

df <- data.table(
  group = c(1:2, 1),
  item_id = c(1:2, 2),
  item_name = c("a", "b", "b"),
  value1 = 1:3,
  value2 = 4:6
)

df %>% complete(item_id,item_name)
df %>% complete(item_id,item_name,fill = 0)
df %>% complete("item")
df %>% complete(item_id=1:3)
df %>% complete(item_id=1:3,group=1:2)
df %>% complete(item_id=1:3,group=1:3,item_name=c("a","b","c"))

df <- data.table(
  group = c(1:2, 1),
  item_id = c(1:2, 2),
  item_name = c("a", "b", "b"),
  value1 = 1:3,
  value2 = 4:6
)

df %>% complete(item_id,item_name)
df %>% complete(item_id,item_name,fill = 0)
df %>% complete("item")
df %>% complete(item_id=1:3)
df %>% complete(item_id=1:3,group=1:2)
df %>% complete(item_id=1:3,group=1:3,item_name=c("a","b","c"))

Count observations by group

Description

Analogous function for count and add_count in dplyr.

Usage

count(.data, ..., sort = FALSE, name = "n")

add_count(.data, ..., name = "n")
count(.data, ..., sort = FALSE, name = "n")

add_count(.data, ..., name = "n")

Arguments

`.data`	data.table
`...`	variables to group by.
`sort`	logical. If TRUE result will be sorted in desending order by resulting variable.
`name`	character. Name of resulting variable. Default uses "n".

Value

data.table

Examples

a = as.data.table(mtcars)
count(a,cyl)
count(a,cyl,sort = TRUE)
a

b = as.data.table(iris)
b %>% add_count(Species,name = "N")
b
a = as.data.table(mtcars)
count(a,cyl)
count(a,cyl,sort = TRUE)
a

b = as.data.table(iris)
b %>% add_count(Species,name = "N")
b

Cumulative mean

Description

Returns a vector whose elements are the cumulative mean of the elements of the argument.

Usage

cummean(x)
cummean(x)

Arguments

`x`	a numeric or complex object, or an object that can be coerced to one of these.

Value

A numeric vector

Examples

cummean(1:10)

cummean(1:10)

Select distinct/unique rows in data.table

Description

Analogous function for distinct in dplyr

Usage

distinct(.data, ..., .keep_all = FALSE)
distinct(.data, ..., .keep_all = FALSE)

Arguments

`.data`	data.table
`...`	Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables.
`.keep_all`	If `TRUE`, keep all variables in data.table. If a combination of ... is not distinct, this keeps the first row of values.

Value

data.table

Examples


 a = as.data.table(iris)
 b = as.data.table(mtcars)
 a %>% distinct(Species)
 b %>% distinct(cyl,vs,.keep_all = TRUE)


a = as.data.table(iris)
 b = as.data.table(mtcars)
 a %>% distinct(Species)
 b %>% distinct(cyl,vs,.keep_all = TRUE)

Drop or delete data by rows or columns

Description

drop_na drops entries by specified columns. delete_na deletes rows or columns with too many NAs.

Usage

drop_na(.data, ...)

delete_na(.data, MARGIN, n)
drop_na(.data, ...)

delete_na(.data, MARGIN, n)

Arguments

`.data`	A data.table
`...`	Colunms to be dropped or deleted.
`MARGIN`	1 or 2. 1 for deleting rows, 2 for deleting columns.
`n`	If number (proportion) of NAs is larger than or equal to "n", the columns/rows would be deleted. When smaller than 1, use as proportion. When larger or equal to 1, use as number.

Value

A data.table

Examples

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x
x %>% delete_na(2,0.75)

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(2,0.5)

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(2,0.24)

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(2,2)

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(1,0.6)
x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(1,2)


x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x
x %>% delete_na(2,0.75)

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(2,0.5)

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(2,0.24)

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(2,2)

x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(1,0.6)
x = data.table(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x %>% delete_na(1,2)

Fast creation of dummy variables

Description

Quickly create dummy (binary) columns from character and factor type columns in the inputted data (and numeric columns if specified.) This function is useful for statistical analysis when you want binary columns rather than character columns.

Usage

dummy(.data, ..., longname = TRUE)
dummy(.data, ..., longname = TRUE)

Arguments

`.data`	data.frame
`...`	Columns you want to create dummy variables from. Very flexible, find in the examples.
`longname`	logical. Should the output column labeled with the original column name? Default uses `TRUE`.

Details

If no columns provided, will return the original data frame.

This function is inspired by fastDummies package, but provides simple and precise usage, whereas fastDummies::dummy_cols provides more features for statistical usage.

Value

data.table

Examples


iris = as.data.table(iris)
iris %>% dummy(Species)
iris %>% dummy(Species,longname = FALSE)

mtcars = as.data.table(mtcars)
mtcars %>% head() %>% dummy(vs,am)
mtcars %>% head() %>% dummy("cyl|gear")

iris = as.data.table(iris)
iris %>% dummy(Species)
iris %>% dummy(Species,longname = FALSE)

mtcars = as.data.table(mtcars)
mtcars %>% head() %>% dummy(vs,am)
mtcars %>% head() %>% dummy("cyl|gear")

Read and write fst files

Description

Wrapper for read_fst and write_fst from fst, but use a different default. For data import, always return a data.table. For data export, always compress the data to the smallest size.

Usage

export_fst(x, path, compress = 100, uniform_encoding = TRUE)

import_fst(
  path,
  columns = NULL,
  from = 1,
  to = NULL,
  as.data.table = TRUE,
  old_format = FALSE
)
export_fst(x, path, compress = 100, uniform_encoding = TRUE)

import_fst(
  path,
  columns = NULL,
  from = 1,
  to = NULL,
  as.data.table = TRUE,
  old_format = FALSE
)

Arguments

`x`	a data frame to write to disk
`path`	path to fst file
`compress`	value in the range 0 to 100, indicating the amount of compression to use. Lower values mean larger file sizes. The default compression is set to 50.
`uniform_encoding`	If 'TRUE', all character vectors will be assumed to have elements with equal encoding. The encoding (latin1, UTF8 or native) of the first non-NA element will used as encoding for the whole column. This will be a correct assumption for most use cases. If 'uniform.encoding' is set to 'FALSE', no such assumption will be made and all elements will be converted to the same encoding. The latter is a relatively expensive operation and will reduce write performance for character columns.
`columns`	Column names to read. The default is to read all columns.
`from`	Read data starting from this row number.
`to`	Read data up until this row number. The default is to read to the last row of the stored dataset.
`as.data.table`	If TRUE, the result will be returned as a `data.table` object. Any keys set on dataset `x` before writing will be retained. This allows for storage of sorted datasets. This option requires `data.table` package to be installed.
`old_format`	must be FALSE, the old fst file format is deprecated and can only be read and converted with fst package versions 0.8.0 to 0.8.10.

Value

'import_fst' returns a data.table with the selected columns and rows. 'export_fst' writes 'x' to a 'fst' file and invisibly returns 'x' (so you can use this function in a pipeline).

Examples



export_fst(iris,"iris_fst_test.fst")
iris_dt = import_fst("iris_fst_test.fst")
iris_dt
unlink("iris_fst_test.fst")


export_fst(iris,"iris_fst_test.fst")
iris_dt = import_fst("iris_fst_test.fst")
iris_dt
unlink("iris_fst_test.fst")

Fill in missing values with previous or next value

Description

Fills missing values in selected columns using the next or previous entry.

Usage

fill(.data, ..., direction = "down")

shift_fill(x, direction = "down")
fill(.data, ..., direction = "down")

shift_fill(x, direction = "down")

Arguments

`.data`	A data.table
`...`	A selection of columns.
`direction`	Direction in which to fill missing values. Currently either "down" (the default), "up".
`x`	A vector.

Details

fill is filling data.table's columns, shift_fill is filling any vectors.

Value

A filled data.table

Examples


df <- data.table(Month = 1:12, Year = c(2000, rep(NA, 10),2001))
df
df %>% fill(Year)

df <- data.table(Month = 1:12, Year = c(2000, rep(NA, 10),2001))
df %>% fill(Year,direction = "up")

df <- data.table(Month = 1:12, Year = c(2000, rep(NA, 10),2001))
df
df %>% fill(Year)

df <- data.table(Month = 1:12, Year = c(2000, rep(NA, 10),2001))
df %>% fill(Year,direction = "up")

Filter entries in data.frame

Description

Analogous function for filter in dplyr.

Usage

filter(.data, ...)
filter(.data, ...)

Arguments

`.data`	data.frame
`...`	List of variables or name-value pairs of summary/modifications functions.

Details

Currently data.table is not able to delete rows by reference,

Value

A data.table

References

https://github.com/Rdatatable/data.table/issues/635

https://stackoverflow.com/questions/10790204/how-to-delete-a-row-by-reference-in-data-table

Examples

iris = as.data.table(iris)
iris %>% filter(Sepal.Length > 7)
iris %>% filter(Sepal.Length > 7,Sepal.Width > 3)
iris %>% filter(Sepal.Length > 7 & Sepal.Width > 3)
iris %>% filter(Sepal.Length == max(Sepal.Length))
iris = as.data.table(iris)
iris %>% filter(Sepal.Length > 7)
iris %>% filter(Sepal.Length > 7,Sepal.Width > 3)
iris %>% filter(Sepal.Length > 7 & Sepal.Width > 3)
iris %>% filter(Sepal.Length == max(Sepal.Length))

Parse,inspect and extract data.table from fst file

Description

An API for reading fst file as data.table.

Usage

parse_fst(path)

slice_fst(ft, row_no)

select_fst(ft, ...)

filter_fst(ft, ...)

summary_fst(ft)
parse_fst(path)

slice_fst(ft, row_no)

select_fst(ft, ...)

filter_fst(ft, ...)

summary_fst(ft)

Arguments

`path`	path to fst file
`ft`	An object of class fst_table, returned by `parse_fst`
`row_no`	An integer vector (Positive)
`...`	The filter conditions

Details

summary_fst could provide some basic information about the fst table.

Value

parse_fst returns a fst_table class.

select_fst and filter_fst returns a data.table.

Examples


  # write the file first
  path = tempfile(fileext = ".fst")
  fst::write_fst(iris,path)
  # parse the file but not reading it
  parse_fst(path) -> ft

  ft

  class(ft)
  lapply(ft,class)
  names(ft)
  dim(ft)
  summary_fst(ft)

  # get the data by query
  ft %>% slice_fst(1:3)
  ft %>% slice_fst(c(1,3))

  ft %>% select_fst(Sepal.Length)
  ft %>% select_fst(Sepal.Length,Sepal.Width)
  ft %>% select_fst("Sepal.Length")
  ft %>% select_fst(1:3)
  ft %>% select_fst(1,3)
  ft %>% select_fst("Se")

  # return a warning with message
  
    ft %>% select_fst("nothing")
  

  ft %>% select_fst("Se|Sp")
  ft %>% select_fst(cols = names(iris)[2:3])

  ft %>% filter_fst(Sepal.Width > 3)
  ft %>% filter_fst(Sepal.Length > 6 , Species == "virginica")
  ft %>% filter_fst(Sepal.Length > 6 & Species == "virginica" & Sepal.Width < 3)

# write the file first
  path = tempfile(fileext = ".fst")
  fst::write_fst(iris,path)
  # parse the file but not reading it
  parse_fst(path) -> ft

  ft

  class(ft)
  lapply(ft,class)
  names(ft)
  dim(ft)
  summary_fst(ft)

  # get the data by query
  ft %>% slice_fst(1:3)
  ft %>% slice_fst(c(1,3))

  ft %>% select_fst(Sepal.Length)
  ft %>% select_fst(Sepal.Length,Sepal.Width)
  ft %>% select_fst("Sepal.Length")
  ft %>% select_fst(1:3)
  ft %>% select_fst(1,3)
  ft %>% select_fst("Se")

  # return a warning with message
  
    ft %>% select_fst("nothing")
  

  ft %>% select_fst("Se|Sp")
  ft %>% select_fst(cols = names(iris)[2:3])

  ft %>% filter_fst(Sepal.Width > 3)
  ft %>% filter_fst(Sepal.Length > 6 , Species == "virginica")
  ft %>% filter_fst(Sepal.Length > 6 & Species == "virginica" & Sepal.Width < 3)

Group by one or more variables

Description

Most data operations are done on groups defined by variables. group_by will group the data.table by selected variables (setting them as keys), and arrange them in ascending order. group_exe could do computations by group, it receives an object returned by group_by.

Usage

group_by(.data, ...)

group_exe(.data, ...)

groups(x)

ungroup(x)
group_by(.data, ...)

group_exe(.data, ...)

groups(x)

ungroup(x)

Arguments

`.data`	A data.table
`...`	For `group_by`:Variables to group by. For `group_exe`:Any data manipulation arguments that could be implemented on a data.table.
`x`	A data.table

Details

For mutate and summarise, it is recommended to use the innate "by" parameter, which is faster. Once the data.table is grouped, the order is changed forever.

groups() could return a character vector of specified groups.

ungroup() would delete the keys in data.table.

Value

A data.table with keys

Examples

a = as.data.table(iris)
a
a %>%
  group_by(Species) %>%
  group_exe(
    head(3)
  )
groups(a)
ungroup(a)
groups(a)
a = as.data.table(iris)
a
a %>%
  group_by(Species) %>%
  group_exe(
    head(3)
  )
groups(a)
ungroup(a)
groups(a)

Join tables

Description

The mutating joins add columns from 'y' to 'x', matching rows based on the keys:

* 'inner_join()': includes all rows in 'x' and 'y'. * 'left_join()': includes all rows in 'x'. * 'right_join()': includes all rows in 'y'. * 'full_join()': includes all rows in 'x' or 'y'.

Filtering joins filter rows from 'x' based on the presence or absence of matches in 'y':

* 'semi_join()' return all rows from 'x' with a match in 'y'. * 'anti_join()' return all rows from 'x' without a match in 'y'.

Usage

inner_join(x, y, by = NULL, on = NULL)

left_join(x, y, by = NULL, on = NULL)

right_join(x, y, by = NULL, on = NULL)

full_join(x, y, by = NULL, on = NULL)

anti_join(x, y, by = NULL, on = NULL)

semi_join(x, y, by = NULL, on = NULL)
inner_join(x, y, by = NULL, on = NULL)

left_join(x, y, by = NULL, on = NULL)

right_join(x, y, by = NULL, on = NULL)

full_join(x, y, by = NULL, on = NULL)

anti_join(x, y, by = NULL, on = NULL)

semi_join(x, y, by = NULL, on = NULL)

Arguments

`x`	A data.table
`y`	A data.table
`by`	(Optional) A character vector of variables to join by. If 'NULL', the default, '*_join()' will perform a natural join, using all variables in common across 'x' and 'y'. A message lists the variables so that you can check they're correct; suppress the message by supplying 'by' explicitly. To join by different variables on 'x' and 'y', use a named vector. For example, 'by = c("a" = "b")' will match 'x$a' to 'y$b'. To join by multiple variables, use a vector with length > 1. For example, 'by = c("a", "b")' will match 'x$a' to 'y$a' and 'x$b' to 'y$b'. Use a named vector to match different variables in 'x' and 'y'. For example, 'by = c("a" = "b", "c" = "d")' will match 'x$a' to 'y$b' and 'x$c' to 'y$d'.
`on`	(Optional) Indicate which columns in x should be joined with which columns in y. Examples included: 1.`.by = c("a","b")` (this is a must for `set_full_join`); 2.`.by = c(x1="y1", x2="y2")`; 3.`.by = c("x1==y1", "x2==y2")`; 4.`.by = c("a", V2="b")`; 5.`.by = .(a, b)`; 6.`.by = c("x>=a", "y<=b")` or `.by = .(x>=a, y<=b)`.

Value

A data.table

Examples


workers = fread("
    name company
    Nick Acme
    John Ajax
    Daniela Ajax
")

positions = fread("
    name position
    John designer
    Daniela engineer
    Cathie manager
")

workers %>% inner_join(positions)
workers %>% left_join(positions)
workers %>% right_join(positions)
workers %>% full_join(positions)

# filtering joins
workers %>% anti_join(positions)
workers %>% semi_join(positions)

# To suppress the message, supply 'by' argument
workers %>% left_join(positions, by = "name")

# Use a named 'by' if the join variables have different names
positions2 = setNames(positions, c("worker", "position")) # rename first column in 'positions'
workers %>% inner_join(positions2, by = c("name" = "worker"))

# the syntax of 'on' could be a bit different
workers %>% inner_join(positions2,on = "name==worker")


workers = fread("
    name company
    Nick Acme
    John Ajax
    Daniela Ajax
")

positions = fread("
    name position
    John designer
    Daniela engineer
    Cathie manager
")

workers %>% inner_join(positions)
workers %>% left_join(positions)
workers %>% right_join(positions)
workers %>% full_join(positions)

# filtering joins
workers %>% anti_join(positions)
workers %>% semi_join(positions)

# To suppress the message, supply 'by' argument
workers %>% left_join(positions, by = "name")

# Use a named 'by' if the join variables have different names
positions2 = setNames(positions, c("worker", "position")) # rename first column in 'positions'
workers %>% inner_join(positions2, by = c("name" = "worker"))

# the syntax of 'on' could be a bit different
workers %>% inner_join(positions2,on = "name==worker")

Fast lead/lag for vectors

Description

Analogous function for lead and lag in dplyr by wrapping data.table's shift.

Usage

lead(x, n = 1L, fill = NA)

lag(x, n = 1L, fill = NA)
lead(x, n = 1L, fill = NA)

lag(x, n = 1L, fill = NA)

Arguments

`x`	A vector
`n`	a positive integer of length 1, giving the number of positions to lead or lag by. Default uses 1
`fill`	Value to use for padding when the window goes beyond the input length. Default uses `NA`

Value

A vector

Examples

lead(1:5)
lag(1:5)
lead(1:5,2)
lead(1:5,n = 2,fill = 0)
lead(1:5)
lag(1:5)
lead(1:5,2)
lead(1:5,n = 2,fill = 0)

Pivot data between long and wide

Description

Fast table pivoting from long to wide and from wide to long. These functions are supported by dcast.data.table and melt.data.table from data.table.

Usage

longer(.data, ..., name = "name", value = "value", na.rm = FALSE)

wider(.data, ..., name, value = NULL, fun = NULL, fill = NA)
longer(.data, ..., name = "name", value = "value", na.rm = FALSE)

wider(.data, ..., name, value = NULL, fun = NULL, fill = NA)

Arguments

`.data`	A data.table
`...`	Columns for unchanged group. Flexible, see examples.
`name`	Name for the measured variable names column.
`value`	Name for the data values column(s).
`na.rm`	If `TRUE`, `NA` values will be removed from the molten data.
`fun`	Should the data be aggregated before casting? Defaults to `NULL`, which uses `length` for aggregation. If a function is provided, with aggregated by this function.
`fill`	Value with which to fill missing cells. Default uses `NA`.

Value

A data.table

Examples


stocks <- data.table(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

stocks %>% longer(time)
stocks %>% longer(-(2:4)) # same
stocks %>% longer(-"X|Y|Z") # same
long_stocks = longer(stocks,"ti") # same as above except for assignment

long_stocks %>% wider(time,name = "name",value = "value")

# the unchanged group could be missed if all the rest will be used
long_stocks %>% wider(name = "name",value = "value")

stocks <- data.table(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

stocks %>% longer(time)
stocks %>% longer(-(2:4)) # same
stocks %>% longer(-"X|Y|Z") # same
long_stocks = longer(stocks,"ti") # same as above except for assignment

long_stocks %>% wider(time,name = "name",value = "value")

# the unchanged group could be missed if all the rest will be used
long_stocks %>% wider(name = "name",value = "value")

Conversion between tidy table and named matrix

Description

Convenient fucntions to implement conversion between tidy table and named matrix.

Usage

mat_df(m)

df_mat(df, row, col, value)
mat_df(m)

df_mat(df, row, col, value)

Arguments

`m`	A matrix
`df`	A data.frame with at least 3 columns, one for row name, one for column name, and one for values. The names for column and row should be unique.
`row`	Unquoted expression of column name for row
`col`	Unquoted expression of column name for column
`value`	Unquoted expression of column name for values

Value

For mat_df, a data.frame. For df_mat, a named matrix.

Examples


mm = matrix(c(1:8,NA),ncol = 3,dimnames = list(letters[1:3],LETTERS[1:3]))
mm
tdf = mat_df(mm)
tdf
mat = df_mat(tdf,row,col,value)
setequal(mm,mat)

tdf %>%
  setNames(c("A","B","C")) %>%
  df_mat(A,B,C)

mm = matrix(c(1:8,NA),ncol = 3,dimnames = list(letters[1:3],LETTERS[1:3]))
mm
tdf = mat_df(mm)
tdf
mat = df_mat(tdf,row,col,value)
setequal(mm,mat)

tdf %>%
  setNames(c("A","B","C")) %>%
  df_mat(A,B,C)

Create or transform variables

Description

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. Both functions preserve the number of rows of the input. New variables overwrite existing variables of the same name.

mutate_when integrates mutate and case_when in dplyr and make a new tidy verb for data.table. mutate_vars is a super function to do updates in specific columns according to conditions.

If you mutate a data.table, it is forever changed. No copies made, which is efficient, but should be used with caution. If you still want the keep the original data.table, use copy first.

Usage

mutate(.data, ..., by)

transmute(.data, ..., by)

mutate_when(.data, when, ..., by)

mutate_vars(.data, .cols = NULL, .func, ..., by)
mutate(.data, ..., by)

transmute(.data, ..., by)

mutate_when(.data, when, ..., by)

mutate_vars(.data, .cols = NULL, .func, ..., by)

Arguments

`.data`	A data.table
`...`	Name-value pairs of expressions
`by`	(Optional) Mutate by what group?
`when`	An object which can be coerced to logical mode
`.cols`	Any types that can be accepted by `select_dt`.
`.func`	Function to be run within each column, should return a value or vectors with same length.

Value

A data.table

Examples

  # Newly created variables are available immediately
  a = as.data.table(mtcars)
  copy(a) %>% mutate(cyl2 = cyl * 2)
  a

  # change forever
  a %>% mutate(cyl2 = cyl * 2)
  a

  # You can also use mutate() to remove variables and
  # modify existing variables
  a %>% mutate(
    mpg = NULL,
    disp = disp * 0.0163871 # convert to litres
  )

  a %>% transmute(cyl,one = 1)
  a


 iris[3:8,] %>%
   as.data.table() %>%
   mutate_when(Petal.Width == .2,
               one = 1,Sepal.Length=2)

 iris[3:8,] %>%
   as.data.table() %>%
   mutate_vars("Pe",scale)

# Newly created variables are available immediately
  a = as.data.table(mtcars)
  copy(a) %>% mutate(cyl2 = cyl * 2)
  a

  # change forever
  a %>% mutate(cyl2 = cyl * 2)
  a

  # You can also use mutate() to remove variables and
  # modify existing variables
  a %>% mutate(
    mpg = NULL,
    disp = disp * 0.0163871 # convert to litres
  )

  a %>% transmute(cyl,one = 1)
  a


 iris[3:8,] %>%
   as.data.table() %>%
   mutate_when(Petal.Width == .2,
               one = 1,Sepal.Length=2)

 iris[3:8,] %>%
   as.data.table() %>%
   mutate_vars("Pe",scale)

Nest and unnest

Description

Analogous function for nest and unnest in tidyr. unnest will automatically remove other list-columns except for the target list-columns (which would be unnested later). Also, squeeze is designed to merge multiple columns into list column.

Usage

nest(.data, ..., mcols = NULL, .name = "ndt")

unnest(.data, ...)

squeeze(.data, ..., .name = "ndt")

chop(.data, ...)

unchop(.data, ...)
nest(.data, ..., mcols = NULL, .name = "ndt")

unnest(.data, ...)

squeeze(.data, ..., .name = "ndt")

chop(.data, ...)

unchop(.data, ...)

Arguments

`.data`	data.table, nested or unnested
`...`	The variables for nest group(for `nest`), columns to be nested(for `squeeze` and `chop`), or column(s) to be unnested(for `unnest`). Could recieve anything that `select_dt` could receive.
`mcols`	Name-variable pairs in the list, form like
`.name`	Character. The nested column name. Defaults to "ndt". `list(petal="^Pe",sepal="^Se")`, see example.

Details

In the nest, the data would be nested to a column named 'ndt', which is short for nested data.table.

The squeeze would not remove the originial columns.

The unchop is the reverse operation of chop.

These functions are experiencing the experimental stage, especially the unnest. If they don't work on some circumtances, try tidyr package.

Value

data.table, nested or unnested

References

https://www.r-bloggers.com/much-faster-unnesting-with-data-table/

https://stackoverflow.com/questions/25430986/create-nested-data-tables-by-collapsing-rows-into-new-data-tables

Examples


mtcars = as.data.table(mtcars)
iris = as.data.table(iris)

# examples for nest

# nest by which columns?
 mtcars %>% nest(cyl)
 mtcars %>% nest("cyl")
 mtcars %>% nest(cyl,vs)
 mtcars %>% nest(vs:am)
 mtcars %>% nest("cyl|vs")
 mtcars %>% nest(c("cyl","vs"))

# nest two columns directly
iris %>% nest(mcols = list(petal="^Pe",sepal="^Se"))

# nest more flexibly
iris %>% nest(mcols = list(ndt1 = 1:3,
  ndt2 = "Pe",
  ndt3 = Sepal.Length:Sepal.Width))

# examples for unnest
# unnest which column?
 mtcars %>% nest("cyl|vs") %>%
   unnest(ndt)
 mtcars %>% nest("cyl|vs") %>%
   unnest("ndt")

df <- data.table(
  a = list(c("a", "b"), "c"),
  b = list(c(TRUE,TRUE),FALSE),
  c = list(3,c(1,2)),
  d = c(11, 22)
)

df
df %>% unnest(a)
df %>% unnest(2)
df %>% unnest("c")
df %>% unnest(cols = names(df)[3])

# You can unnest multiple columns simultaneously
df %>% unnest(1:3)
df %>% unnest(a,b,c)
df %>% unnest("a|b|c")

# examples for squeeze
# nest which columns?
iris %>% squeeze(1:2)
iris %>% squeeze("Se")
iris %>% squeeze(Sepal.Length:Petal.Width)

# examples for chop
df <- data.table(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
df %>% chop(y,z)
df %>% chop(y,z) %>% unchop(y,z)
mtcars = as.data.table(mtcars)
iris = as.data.table(iris)

# examples for nest

# nest by which columns?
 mtcars %>% nest(cyl)
 mtcars %>% nest("cyl")
 mtcars %>% nest(cyl,vs)
 mtcars %>% nest(vs:am)
 mtcars %>% nest("cyl|vs")
 mtcars %>% nest(c("cyl","vs"))

# nest two columns directly
iris %>% nest(mcols = list(petal="^Pe",sepal="^Se"))

# nest more flexibly
iris %>% nest(mcols = list(ndt1 = 1:3,
  ndt2 = "Pe",
  ndt3 = Sepal.Length:Sepal.Width))

# examples for unnest
# unnest which column?
 mtcars %>% nest("cyl|vs") %>%
   unnest(ndt)
 mtcars %>% nest("cyl|vs") %>%
   unnest("ndt")

df <- data.table(
  a = list(c("a", "b"), "c"),
  b = list(c(TRUE,TRUE),FALSE),
  c = list(3,c(1,2)),
  d = c(11, 22)
)

df
df %>% unnest(a)
df %>% unnest(2)
df %>% unnest("c")
df %>% unnest(cols = names(df)[3])

# You can unnest multiple columns simultaneously
df %>% unnest(1:3)
df %>% unnest(a,b,c)
df %>% unnest("a|b|c")

# examples for squeeze
# nest which columns?
iris %>% squeeze(1:2)
iris %>% squeeze("Se")
iris %>% squeeze(Sepal.Length:Petal.Width)

# examples for chop
df <- data.table(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
df %>% chop(y,z)
df %>% chop(y,z) %>% unchop(y,z)

Extract the nth value from a vector

Description

Get the value from a vector with its position.

Usage

nth(v, n = 1)
nth(v, n = 1)

Arguments

`v`	A vector
`n`	A single integer specifying the position. Default uses `1`. Negative integers index from the end (i.e. -1L will return the last value in the vector). If a double is supplied, it will be silently truncated.

Value

A single value.

Examples


x = 1:10
nth(x, 1)
nth(x, 5)
nth(x, -2)

x = 1:10
nth(x, 1)
nth(x, 5)
nth(x, -2)

Nice printing of report the Space Allocated for an Object

Description

Provides an estimate of the memory that is being used to store an R object. A wrapper of 'object.size', but use a nicer printing unit.

Usage

object_size(object)
object_size(object)

Arguments

object

an R object.

Value

An object of class "object_size"

Examples


iris %>% object_size()

iris %>% object_size()

Pull out a single variable

Description

Analogous function for pull in dplyr

Usage

pull(.data, col)
pull(.data, col)

Arguments

`.data`	data.frame
`col`	A name of column or index (should be positive).

Value

A vector

Examples

mtcars %>% pull(2)
mtcars %>% pull(cyl)
mtcars %>% pull("cyl")
mtcars %>% pull(2)
mtcars %>% pull(cyl)
mtcars %>% pull("cyl")

Convenient file reader

Description

A wrapper of fread in data.table. Highlighting the encoding.

Usage

read_csv(path, utf8 = FALSE, ...)
read_csv(path, utf8 = FALSE, ...)

Arguments

`path`	File name in working directory, path to file.
`utf8`	Should "UTF-8" used as the encoding? (Defaults to `FALSE`)
`...`	Other parameters passed to `data.table::fread`.

Value

A data.table

Change column order

Description

Use 'relocate()' to change column positions, using the same syntax as 'select()'. Check similar function as 'relocate()' in dplyr.

Usage

relocate(.data, ..., how = "first", where = NULL)
relocate(.data, ..., how = "first", where = NULL)

Arguments

`.data`	A data.table
`...`	Columns to move
`how`	The mode of movement, including "first","last","after","before". Default uses "first".
`where`	Destination of columns selected by `...`. Applicable for "after" and "before" mode.

Details

Once you relocate the columns, the order changes forever.

Value

A data.table with rearranged columns.

Examples

df <- data.table(a = 1, b = 1, c = 1, d = "a", e = "a", f = "a")
df
df %>% relocate(f)
df %>% relocate(a,how = "last")

df %>% relocate(is.character)
df %>% relocate(is.numeric, how = "last")
df %>% relocate("[aeiou]")

df %>% relocate(a, how = "after",where = f)
df %>% relocate(f, how = "before",where = a)
df %>% relocate(f, how = "before",where = c)
df %>% relocate(f, how = "after",where = c)

df2 <- data.table(a = 1, b = "a", c = 1, d = "a")
df2 %>% relocate(is.numeric,
                    how = "after",
                    where = is.character)
df2 %>% relocate(is.numeric,
                    how="before",
                    where = is.character)
df <- data.table(a = 1, b = 1, c = 1, d = "a", e = "a", f = "a")
df
df %>% relocate(f)
df %>% relocate(a,how = "last")

df %>% relocate(is.character)
df %>% relocate(is.numeric, how = "last")
df %>% relocate("[aeiou]")

df %>% relocate(a, how = "after",where = f)
df %>% relocate(f, how = "before",where = a)
df %>% relocate(f, how = "before",where = c)
df %>% relocate(f, how = "after",where = c)

df2 <- data.table(a = 1, b = "a", c = 1, d = "a")
df2 %>% relocate(is.numeric,
                    how = "after",
                    where = is.character)
df2 %>% relocate(is.numeric,
                    how="before",
                    where = is.character)

Fast value replacement in data frame

Description

replace_vars could replace any value(s) or values that match specific patterns to another specific value in a data.table.

Usage

replace_vars(.data, ..., from = is.na, to)
replace_vars(.data, ..., from = is.na, to)

Arguments

`.data`	A data.table
`...`	Colunms to be replaced. If not specified, use all columns.
`from`	A value, a vector of values or a function returns a logical value. Defaults to `NaN`.
`to`	A value.

Value

A data.table.

Examples

 iris %>% as.data.table() %>%
   mutate(Species = as.character(Species))-> new_iris

 new_iris %>%
   replace_vars(Species, from = "setosa",to = "SS")
 new_iris %>%
   replace_vars(Species,from = c("setosa","virginica"),to = "sv")
 new_iris %>%
   replace_vars(Petal.Width, from = .2,to = 2)
 new_iris %>%
   replace_vars(from = .2,to = NA)
 new_iris %>%
   replace_vars(is.numeric, from = function(x) x > 3, to = 9999 )

iris %>% as.data.table() %>%
   mutate(Species = as.character(Species))-> new_iris

 new_iris %>%
   replace_vars(Species, from = "setosa",to = "SS")
 new_iris %>%
   replace_vars(Species,from = c("setosa","virginica"),to = "sv")
 new_iris %>%
   replace_vars(Petal.Width, from = .2,to = 2)
 new_iris %>%
   replace_vars(from = .2,to = NA)
 new_iris %>%
   replace_vars(is.numeric, from = function(x) x > 3, to = 9999 )

Computation by rows

Description

Compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Only mutate and summarise are supported so far.

Usage

rowwise_mutate(.data, ...)

rowwise_summarise(.data, ...)
rowwise_mutate(.data, ...)

rowwise_summarise(.data, ...)

Arguments

`.data`	A data.table
`...`	Name-value pairs of expressions

Value

A data.table

Examples

# without rowwise
df <- data.table(x = 1:2, y = 3:4, z = 4:5)
df %>% mutate(m = mean(c(x, y, z)))
# with rowwise
df <- data.table(x = 1:2, y = 3:4, z = 4:5)
df %>% rowwise_mutate(m = mean(c(x, y, z)))


# # rowwise is also useful when doing simulations
params = fread(" sim n mean sd
  1  1     1   1
  2  2     2   4
  3  3    -1   2")

params %>%
  rowwise_summarise(sim,z = rnorm(n,mean,sd))

# without rowwise
df <- data.table(x = 1:2, y = 3:4, z = 4:5)
df %>% mutate(m = mean(c(x, y, z)))
# with rowwise
df <- data.table(x = 1:2, y = 3:4, z = 4:5)
df %>% rowwise_mutate(m = mean(c(x, y, z)))


# # rowwise is also useful when doing simulations
params = fread(" sim n mean sd
  1  1     1   1
  2  2     2   4
  3  3    -1   2")

params %>%
  rowwise_summarise(sim,z = rnorm(n,mean,sd))

Select/rename variables by name

Description

Choose or rename variables from a data.table. select() keeps only the variables you mention; rename() keeps all variables.

Usage

select(.data, ...)

select_vars(.data, ..., rm.dup = TRUE)

select_dt(.data, ..., cols = NULL, negate = FALSE)

select_mix(.data, ..., rm.dup = TRUE)

rename(.data, ...)
select(.data, ...)

select_vars(.data, ..., rm.dup = TRUE)

select_dt(.data, ..., cols = NULL, negate = FALSE)

select_mix(.data, ..., rm.dup = TRUE)

rename(.data, ...)

Arguments

`.data`	A data.table
`...`	One or more unquoted expressions separated by commas. Very flexible, same as `tidyfst::select_dt` and `tidyfst::select_mix`. details find `select_dt`.
`rm.dup`	Should duplicated columns be removed? Defaults to `TRUE`.
`cols`	(Optional)A numeric or character vector.
`negate`	Applicable when regular expression and "cols" is used. If `TRUE`, return the non-matched pattern. Default uses `FALSE`.

Details

No copy is made. Once you select or rename a data.table, they would be changed forever. select_vars could select across different data types, names and index. See examples.

select_dt and select_mix is the safe mode of select and select_vars, they keey the original copy but are not memory-efficient when dealing with large data sets.

Value

A data.table

Examples


  a = as.data.table(iris)
  a %>% select(1:3)
  a

  a = as.data.table(iris)
  a %>% select_vars(is.factor,"Se")
  a

  a = as.data.table(iris)
  a %>% select("Se") %>%
    rename(sl = Sepal.Length,
    sw = Sepal.Width)
  a


DT = data.table(a=1:2,b=3:4,c=5:6)
DT
DT %>% rename(B=b)

a = as.data.table(iris)
  a %>% select(1:3)
  a

  a = as.data.table(iris)
  a %>% select_vars(is.factor,"Se")
  a

  a = as.data.table(iris)
  a %>% select("Se") %>%
    rename(sl = Sepal.Length,
    sw = Sepal.Width)
  a


DT = data.table(a=1:2,b=3:4,c=5:6)
DT
DT %>% rename(B=b)

Separate a character column into two columns using a regular expression separator

Description

Given either regular expression, separate() turns a single character column into two columns. Analogous to tidyr::separate, but only split into two columns only.

Usage

separate(.data, separated_colname, into, sep = "[^[:alnum:]]+", remove = TRUE)
separate(.data, separated_colname, into, sep = "[^[:alnum:]]+", remove = TRUE)

Arguments

`.data`	A data frame.
`separated_colname`	Column name, string only.
`into`	Character vector of length 2.
`sep`	Separator between columns.
`remove`	If `TRUE`, remove input column from output data frame.

Value

A data.table

Examples

df <- data.table(x = c(NA, "a.b", "a.d", "b.c"))
df %>% separate(x, c("A", "B"))
# equals to
df <- data.table(x = c(NA, "a.b", "a.d", "b.c"))
df %>% separate("x", c("A", "B"))
df <- data.table(x = c(NA, "a.b", "a.d", "b.c"))
df %>% separate(x, c("A", "B"))
# equals to
df <- data.table(x = c(NA, "a.b", "a.d", "b.c"))
df %>% separate("x", c("A", "B"))

Subset rows using their positions

Description

'slice()' lets you index rows by their (integer) locations. It allows you to select, remove, and duplicate rows. It is accompanied by a number of helpers for common use cases:

* 'slice_head()' and 'slice_tail()' select the first or last rows. * 'slice_sample()' randomly selects rows. * 'slice_min()' and 'slice_max()' select rows with highest or lowest values of a variable.

Usage

slice(.data, ...)

slice_head(.data, n)

slice_tail(.data, n)

slice_max(.data, order_by, n, with_ties = TRUE)

slice_min(.data, order_by, n, with_ties = TRUE)

slice_sample(.data, n, replace = FALSE)
slice(.data, ...)

slice_head(.data, n)

slice_tail(.data, n)

slice_max(.data, order_by, n, with_ties = TRUE)

slice_min(.data, order_by, n, with_ties = TRUE)

slice_sample(.data, n, replace = FALSE)

Arguments

`.data`	A data.table
`...`	Provide either positive values to keep, or negative values to drop. The values provided must be either all positive or all negative.
`n`	When larger than or equal to 1, the number of rows. When between 0 and 1, the proportion of rows to select.
`order_by`	Variable or function of variables to order by.
`with_ties`	Should ties be kept together? The default, 'TRUE', may return more rows than you request. Use 'FALSE' to ignore ties, and return the first 'n' rows.
`replace`	Should sampling be performed with ('TRUE') or without ('FALSE', the default) replacement.

Value

A data.table

Examples


a = as.data.table(iris)
slice(a,1,2)
slice(a,2:3)
slice_head(a,5)
slice_head(a,0.1)
slice_tail(a,5)
slice_tail(a,0.1)
slice_max(a,Sepal.Length,10)
slice_max(a,Sepal.Length,10,with_ties = FALSE)
slice_min(a,Sepal.Length,10)
slice_min(a,Sepal.Length,10,with_ties = FALSE)
slice_sample(a,10)
slice_sample(a,0.1)

a = as.data.table(iris)
slice(a,1,2)
slice(a,2:3)
slice_head(a,5)
slice_head(a,0.1)
slice_tail(a,5)
slice_tail(a,0.1)
slice_max(a,Sepal.Length,10)
slice_max(a,Sepal.Length,10,with_ties = FALSE)
slice_min(a,Sepal.Length,10)
slice_min(a,Sepal.Length,10,with_ties = FALSE)
slice_sample(a,10)
slice_sample(a,0.1)

Summarise columns to single values

Description

Create one or more scalar variables summarizing the variables of an existing data.table.

Usage

summarise(.data, ..., by = NULL)

summarise_when(.data, when, ..., by = NULL)

summarise_vars(.data, .cols = NULL, .func, ..., by)
summarise(.data, ..., by = NULL)

summarise_when(.data, when, ..., by = NULL)

summarise_vars(.data, .cols = NULL, .func, ..., by)

Arguments

`.data`	A data.table
`...`	List of variables or name-value pairs of summary/modifications functions for `summarise_dt`.Additional parameters to be passed to parameter '.func' in `summarise_vars`.
`by`	Unquoted name of grouping variable of list of unquoted names of grouping variables. For details see data.table
`when`	An object which can be coerced to logical mode
`.cols`	Columns to be summarised.
`.func`	Function to be run within each column, should return a value or vectors with same length.

Value

A data.table

Examples


a = as.data.table(iris)
a %>% summarise(sum = sum(Sepal.Length),avg = mean(Sepal.Length))


a %>%
  summarise_when(Sepal.Length > 5, avg = mean(Sepal.Length), by = Species)

a %>%
  summarise_vars(is.numeric, min, by = Species)


a = as.data.table(iris)
a %>% summarise(sum = sum(Sepal.Length),avg = mean(Sepal.Length))


a %>%
  summarise_when(Sepal.Length > 5, avg = mean(Sepal.Length), by = Species)

a %>%
  summarise_vars(is.numeric, min, by = Species)

Convenient print of time taken

Description

Convenient printing of time elapsed. A wrapper of data.table::timetaken, but showing the results more directly.

Usage

sys_time_print(expr)
sys_time_print(expr)

Arguments

expr

Valid R expression to be timed.

Value

A character vector of the form HH:MM:SS, or SS.MMMsec if under 60 seconds. See examples.

Examples


sys_time_print(Sys.sleep(1))

a = as.data.table(iris)
sys_time_print({
  res = a %>%
    mutate(one = 1)
})
res
sys_time_print(Sys.sleep(1))

a = as.data.table(iris)
sys_time_print({
  res = a %>%
    mutate(one = 1)
})
res

"Uncount" a data frame

Description

Performs the opposite operation to 'dplyr::count()', duplicating rows according to a weighting variable (or expression). Analogous to 'tidyr::uncount'.

Usage

uncount(.data, wt, .remove = TRUE)
uncount(.data, wt, .remove = TRUE)

Arguments

`.data`	A data.frame
`wt`	A vector of weights.
`.remove`	Should the column for `weights` be removed? Default uses `TRUE`.

Value

A data.table

Examples


df <- data.table(x = c("a", "b"), n = c(1, 2))
uncount(df, n)
uncount(df,n,FALSE)
df <- data.table(x = c("a", "b"), n = c(1, 2))
uncount(df, n)
uncount(df,n,FALSE)

Unite multiple columns into one by pasting strings together

Description

Convenience function to paste together multiple columns into one. Analogous to tidyr::unite.

Usage

unite(.data, united_colname, ..., sep = "_", remove = FALSE, na2char = FALSE)
unite(.data, united_colname, ..., sep = "_", remove = FALSE, na2char = FALSE)

Arguments

`.data`	A data frame.
`united_colname`	The name of the new column, string only.
`...`	A selection of columns. If want to select all columns, pass "" to the parameter. See example.
`sep`	Separator to use between values.
`remove`	If `TRUE`, remove input columns from output data frame.
`na2char`	If `FALSE`, missing values would be merged into `NA`, otherwise `NA` is treated as character "NA". This is different from tidyr.

Value

A data.table

Examples

df <- CJ(x = c("a", NA), y = c("b", NA))
df

# Treat missing value as NA, default
df %>% unite("z", x:y, remove = FALSE)
# Treat missing value as character "NA"
df %>% unite("z", x:y, na2char = TRUE, remove = FALSE)
# the unite has memory, "z" would not be removed in new operations
# here we remove the original columns ("x" and "y")
df %>% unite("xy", x:y,remove = TRUE)

# Select all columns
iris %>% as.data.table %>% unite("merged_name",".")
df <- CJ(x = c("a", NA), y = c("b", NA))
df

# Treat missing value as NA, default
df %>% unite("z", x:y, remove = FALSE)
# Treat missing value as character "NA"
df %>% unite("z", x:y, na2char = TRUE, remove = FALSE)
# the unite has memory, "z" would not be removed in new operations
# here we remove the original columns ("x" and "y")
df %>% unite("xy", x:y,remove = TRUE)

# Select all columns
iris %>% as.data.table %>% unite("merged_name",".")

Use UTF-8 for character encoding in a data frame

Description

fread from data.table could not recognize the encoding and return the correct form, this could be unconvenient for text mining tasks. The utf8-encoding could use "UTF-8" as the encoding to override the current encoding of characters in a data frame.

Usage

utf8_encoding(.data, .cols)
utf8_encoding(.data, .cols)

Arguments

`.data`	A data.frame.
`.cols`	The columns you want to convert, usually a character column.

Value

A data.table with characters in UTF-8 encoding

Examples

iris %>%
  as.data.table() %>%
  utf8_encoding(Species)  # could also use `is.factor`
iris %>%
  as.data.table() %>%
  utf8_encoding(Species)  # could also use `is.factor`

Package 'tidyft'

Help Index

Arrange entries in data.frame

Description

Usage

Arguments

Details

Value

See Also

Examples

Save a data.frame as a fst table

Description

Usage

Arguments

Value

Examples

Complete a data frame with missing combinations of data

Description

Usage

Arguments

Details

Value

See Also

Examples

Count observations by group

Description

Usage

Arguments

Value

Examples

Cumulative mean

Description

Usage

Arguments

Value

Examples

Select distinct/unique rows in data.table

Description

Usage

Arguments

Value

See Also

Examples

Drop or delete data by rows or columns

Description

Usage

Arguments

Value

Examples

Fast creation of dummy variables

Description

Usage

Arguments

Details

Value

See Also

Examples

Read and write fst files

Description

Usage

Arguments

Value

See Also

Examples

Fill in missing values with previous or next value

Description

Usage

Arguments

Details

Value

Examples

Filter entries in data.frame

Description

Usage

Arguments

Details

Value

References

See Also

Examples