Wednesday 15 July 2015

r - How to use zoo or xts with large data? -


How can I use R package zoos or xts with large data sets? (100 GB) I know that there are some big packages like bigrf, ff, bigmemory which can deal with this problem but you have to use your limited commands, they do not have the work of zoos or exts and I do not know the zoo Or how to use extras. How can I use it?

I have noticed that there are some other things related to the database, such as sqldf and houststeading, rhythm, or any other use is done by revolution r. what do you recommend? , Any other?

I just want to encourage the series, cleanse and do some coins and plots.

Added: I am on Windows

I have a similar problem (though I was only playing with 9-10 GB) My experience is that there is no way that R can manage a lot of data on its data , especially when your dataset includes time series data

if your There are so many zeros in the dataset, you might be able to use it sparse matrix - see the matrix package (); This manual can also be useful ()

I have used PostgreSQL - the relevant R package is RPostgreSQL (). This allows you to query your PostgreSQL database; It uses the SQL syntax, the data is downloaded in the form of dataframe, it can be slow (depending on the complexity of your query), but it is strong and can be easy to data aggregation.

Drawback : You will need to upload data. Database first needs to be cleaned and saved in your raw data in some readable formats (txt / csv). If your data is not already in a smart format, then it is likely to be the biggest issue. So far, it is easy to upload "streamlined" data in DB (see and share)

I would recommend postgreSQL or any other relational database for your work. I did not try Hadop, but using CouchDB almost drove me out of the round. Stay with the good old SQL

No comments:

Post a Comment