I'm learning R and I'm a big fan of data.table package - I like it's database-like syntax and performance.
When I was reading web pages and blogs on data analysis, I found this post:
A Large Data Workflow with Pandas
Data Analysis of 8.2 Million Rows with Python and SQLite
https://plot.ly/ipython-notebooks/big-data-analytics-with-pandas-and-sqlite/
I would like to practice this data analysis with data.table; however, there's only 4Gb RAM on my laptop:
➜ ~ free -m
total used free shared buff/cache available
Mem: 3686 966 1976 130 743 2359
Swap: 8551 0 8551
➜ ~
The dataset is a 3.9Gb CSV file, my availalbe memory is not enough to read the file as a data.table. But I'm not willing to give up data.table package.
Question:
Is there a database interface for
data.tablepackage? I searched its documentation and have no good luck.If
data.tableis not the right tool for this task, which approach is highly recommended?(1)
sqldfor (2)sqlite+dplyror (3)ff/bigmemorypackage?
I've noticed that each of above packages has distinctive syntax. The pandas in the linked post can do almost all these task in one set of tools. Is there possibly a similar approach in R?