I am currently working on a Bayesian method that requires multiple steps of optimisation of a multinomial logit model per iteration. I am using optim() to perform those optimisations, and an objective function written in R. A profiling revealed that optim() is the main bottleneck.
After digging around, I found this question in which they suggest that recoding the objective function with Rcpp could speed up the process. I followed the suggestion and recoded my objective function with Rcpp, but it ended up being slower (about two times slower!).
This was my first time with Rcpp (or anything related to C++) and I was not able to find a way of vectorising the code. Any idea how to make it faster?
Tl;dr: Current implementation of function in Rcpp is not as fast as vectorised R; how to make it faster?
A reproducible example:
- Define objective functions in
RandRcpp: log-likelihood of an intercept only multinomial model
library(Rcpp)
library(microbenchmark)
llmnl_int <- function(beta, Obs, n_cat) {
n_Obs <- length(Obs)
Xint <- matrix(c(0, beta), byrow = T, ncol = n_cat, nrow = n_Obs)
ind <- cbind(c(1:n_Obs), Obs)
Xby <- Xint[ind]
Xint <- exp(Xint)
iota <- c(rep(1, (n_cat)))
denom <- log(Xint %*% iota)
return(sum(Xby - denom))
}
cppFunction('double llmnl_int_C(NumericVector beta, NumericVector Obs, int n_cat) {
int n_Obs = Obs.size();
NumericVector betas = (beta.size()+1);
for (int i = 1; i < n_cat; i++) {
betas[i] = beta[i-1];
};
NumericVector Xby = (n_Obs);
NumericMatrix Xint(n_Obs, n_cat);
NumericVector denom = (n_Obs);
for (int i = 0; i < Xby.size(); i++) {
Xint(i,_) = betas;
Xby[i] = Xint(i,Obs[i]-1.0);
Xint(i,_) = exp(Xint(i,_));
denom[i] = log(sum(Xint(i,_)));
};
return sum(Xby - denom);
}')
- Compare their efficiency:
## Draw sample from a multinomial distribution
set.seed(2020)
mnl_sample <- t(rmultinom(n = 1000,size = 1,prob = c(0.3, 0.4, 0.2, 0.1)))
mnl_sample <- apply(mnl_sample,1,function(r) which(r == 1))
## Benchmarking
microbenchmark("llmml_int" = llmnl_int(beta = c(4,2,1), Obs = mnl_sample, n_cat = 4),
"llmml_int_C" = llmnl_int_C(beta = c(4,2,1), Obs = mnl_sample, n_cat = 4),
times = 100)
## Results
# Unit: microseconds
# expr min lq mean median uq max neval
# llmnl_int 76.809 78.6615 81.9677 79.7485 82.8495 124.295 100
# llmnl_int_C 155.405 157.7790 161.7677 159.2200 161.5805 201.655 100
- Now calling them in
optim:
## Benchmarking with optim
microbenchmark("llmnl_int" = optim(c(4,2,1), llmnl_int, Obs = mnl_sample, n_cat = 4, method = "BFGS", hessian = T, control = list(fnscale = -1)),
"llmnl_int_C" = optim(c(4,2,1), llmnl_int_C, Obs = mnl_sample, n_cat = 4, method = "BFGS", hessian = T, control = list(fnscale = -1)),
times = 100)
## Results
# Unit: milliseconds
# expr min lq mean median uq max neval
# llmnl_int 12.49163 13.26338 15.74517 14.12413 18.35461 26.58235 100
# llmnl_int_C 25.57419 25.97413 28.05984 26.34231 30.44012 37.13442 100
I was somewhat surprised that the vectorised implementation in R was faster. Implementing a more efficient version in Rcpp (say, with RcppArmadillo?) can produce any gains? Is it a better idea to recode everything in Rcpp using a C++ optimiser?