Specify column classes when reading in data via lapply(FileList, read.xls)

Question

My question is about how to specify the class for various columns when reading in data that come from many files. More specifically, I am uploading 1000s of .xlsx files at a time and converting them to .csv files using the read.xls() function in the gdata package.

My approach is as follows:

Myfiles<-list.files() # lists all files in working directory (which contains data files)
library(gdata)
Mylist <- lapply(Myfiles, read.xls, header=T,
    perl="C:/Users/A/PERL/perl/bin/perl.exe",
    sheet=1,
    method="csv",
    skip=1,
    as.is=1)

I apologize for not providing a workable example. I'm not sure how to do so for this problem.

All the .xlsx files have identical headers and set-up, but the classes of corresponding columns in the data frames within Mylist are not all the same. Is there a way to specify the classes within the lapply() approach I am using? I know you can extend functions of read.table() to read.xls() but I haven't figured out how to specify the column classes properly within the lapply call.

Have you examined the actual data in the columns that dont seem to have the right class? My guess is that you have some offending characters in there that are mucking things up. You could also use `lapply()` over the resulting list and convert the columns there, as outlined [here](http://stackoverflow.com/questions/3796266/change-the-class-of-many-columns-in-a-data-frame). Finally, if you're just writing back out to CSV, does it really matter? — Chase, Nov 10 '12 at 20:08
Just read the help page for `read.xls()`. Looks like you can pass additional arguments via the `...` which go to read.table(). On the help page there, you'll see one of the parameters is `colClasses()` which can be one of the six atomic vector classes. These are defined on the help page for `?vector`. — Chase, Nov 10 '12 at 20:13
The syntax is `lapply(Myfiles, read.xls, colClasses = c(...whatever...), ...whatever...)` — G. Grothendieck, Nov 10 '12 at 23:02

score 1 · Accepted Answer · answered Nov 11 '12 at 12:00

1

It's all in Gabor's comment, but to put this one to bed:

lapply(Myfiles, read.xls, colClasses = c("character", "numeric", "factor"), header=T)

answered Nov 11 '12 at 12:00

seancarmody

6,182
2
34
31

Specify column classes when reading in data via lapply(FileList, read.xls)

1 Answers1