If I create an R data.table with string columns without calling stringsAsFactors=TRUE and then try to take unique rows of the data table with unique, then the strings get stripped from the resulting table, though they are considered in determining which rows are unique.
> dt <- data.table(x=c('a', 'a', 'b', 'c'), y=c(1, 1, 2, 2), stringsAsFactors=FALSE)
> unique(dt)
x y
1: 1
2: 2
3: 2
> dt <- data.table(x=c('a', 'a', 'b', 'c'), y=c(1, 1, 2, 2), stringsAsFactors=TRUE)
> unique(dt)
x y
1: a 1
2: b 2
3: c 2
Is this correct behavior? I'm on Cygwin and have uncovered a few mysterious Cygwin-specific issues in the R internals before. Here's the readout of sessionInfo():
R version 3.4.0 (2017-04-21)
Platform: x86_64-unknown-cygwin (64-bit)
Running under: CYGWIN_NT-6.1 INT-3A02 2.8.1(0.312/5/3) 2017-07-03 14:11 x86_64 Cygwin
Matrix products: default
LAPACK: /usr/lib/R/modules/lapack.dll
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.10.4
loaded via a namespace (and not attached):
[1] bit_1.1-12 compiler_3.4.0 bit64_0.9-7