As a matter of best practices, I'm trying to determine if it's better to create a function and apply() it across a matrix, or if it's better to simply loop a matrix through the function. I tried it both ways and was surprised to find apply() is slower. The task is to take a vector and evaluate it as either being positive or negative and then return a vector with 1 if it's positive and -1 if it's negative. The mash() function loops and the squish() function is passed to the apply() function.
million <- as.matrix(rnorm(100000))
mash <- function(x){
for(i in 1:NROW(x))
if(x[i] > 0) {
x[i] <- 1
} else {
x[i] <- -1
}
return(x)
}
squish <- function(x){
if(x >0) {
return(1)
} else {
return(-1)
}
}
ptm <- proc.time()
loop_million <- mash(million)
proc.time() - ptm
ptm <- proc.time()
apply_million <- apply(million,1, squish)
proc.time() - ptm
loop_million results:
user system elapsed
0.468 0.008 0.483
apply_million results:
user system elapsed
1.401 0.021 1.423
What is the advantage to using apply() over a for loop if performance is degraded? Is there a flaw in my test? I compared the two resulting objects for a clue and found:
> class(apply_million)
[1] "numeric"
> class(loop_million)
[1] "matrix"
Which only deepens the mystery. The apply() function cannot accept a simple numeric vector and that's why I cast it with as.matrix() in the beginning. But then it returns a numeric. The for loop is fine with a simple numeric vector. And it returns an object of same class as that one passed to it.