Dangers of mixing [tidyverse] and [data.table] syntax in R?

I'm getting some very weird behavior from mixing tidyverse and data.table syntax. For context, I often find myself using tidyverse syntax, and then adding a pipe back to data.table when I need speed vs. when I need code readability. I know Hadley's working on a new package that uses tidyverse syntax with data.table speed, but from what I see, it's still in it's nascent phases, so I haven't been using it.

Anyone care to explain what's going on here? This is very scary for me, as I've probably done these thousands of times without thinking.

library(dplyr); library(data.table)
DT <-
  fread(
    "iso3c  country income
MOZ Mozambique  LIC
ZMB Zambia  LMIC
ALB Albania UMIC
MOZ Mozambique  LIC
ZMB Zambia  LMIC
ALB Albania UMIC
"
  )

codes <- c("ALB", "ZMB")

# now, what happens if I use a tidyverse function (distinct) and then
# convert back to data.table?
DT <- distinct(DT) %>% as.data.table()

# this works like normal
DT[iso3c %in% codes]
# iso3c country income
# 1:   ZMB  Zambia   LMIC
# 2:   ALB Albania   UMIC

# now, what happens if I use a different tidyverse function (arrange) 
# and then convert back to data.table?
DT <- DT %>% arrange(iso3c) %>% as.data.table()

# this is wack: (!!!!!!!!!!!!)
DT[iso3c %in% codes]
# iso3c country income
# 1:   ALB Albania   UMIC

# but these work:
DT[(iso3c %in% codes), ]
# iso3c country income
# 1:   ZMB  Zambia   LMIC
# 2:   ALB Albania   UMIC
DT[DT$iso3c %in% codes, ]
# iso3c country income
# 1:   ZMB  Zambia   LMIC
# 2:   ALB Albania   UMIC
DT[DT$iso3c %in% codes]
# iso3c country income
# 1:   ZMB  Zambia   LMIC
# 2:   ALB Albania   UMIC


Read more here: https://stackoverflow.com/questions/67940098/dangers-of-mixing-tidyverse-and-data-table-syntax-in-r

Content Attribution

This content was originally published by Daycent at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: