I am currently trying to implement a machine learning algorithm that involves the logistic loss function in MATLAB. Unfortunately, I am having some trouble due to numerical overflow.
In general, for a given an input s, the value of the logistic function is:
log(1 + exp(s))
and the slope of the logistic loss function is:
exp(s)./(1 + exp(s)) = 1./(1 + exp(-s))
In my algorithm, the value of s = X*beta. Here X is a matrix with N data points and P features per data point (i.e. size(X)=[N,P]) and beta is a vector of P coefficients for each feature such that size(beta)=[P 1].
I am specifically interested in calculating the average value and gradient of the Logistic function for given value of beta.
The average value of the Logistic function w.r.t to a value of beta is:
L = 1/N * sum(log(1+exp(X*beta)),1)
The average value of the slope of the Logistic function w.r.t. to a value of b is:
dL = 1/N * sum((exp(X*beta)./(1+exp(X*beta))' X, 1)'
Note that size(dL) = [P 1].
My issue is that these expressions keep producing numerical overflows. The problem effectively comes from the fact that exp(s)=Inf when s>1000 and exp(s)=0 when s<-1000.
I am looking for a solution such that s can take on any value in floating point arithmetic. Ideally, I would also really appreciate a solution that allows me to evaluate the value and gradient in a vectorized / efficient way.