How to interpret clearly the meaning of the units parameter in Keras?

Question

I am wondering how LSTM work in Keras. In this tutorial for example, as in many others, you can find something like this :

model.add(LSTM(4, input_shape=(1, look_back)))

What does the "4" mean. Is it the number of neuron in the layer. By neuron, I mean something that for each instance gives a single output ?

Actually, I found this brillant discussion but wasn't really convinced by the explanation mentioned in the reference given.

On the scheme, one can see the num_unitsillustrated and I think I am not wrong in saying that each of this unit is a very atomic LSTM unit (i.e. the 4 gates). However, how these units are connected ? If I am right (but not sure), x_(t-1)is of size nb_features, so each feature would be an input of a unit and num_unit must be equal to nb_features right ?

Now, let's talk about keras. I have read this post and the accepted answer and get trouble. Indeed, the answer says :

Basically, the shape is like (batch_size, timespan, input_dim), where input_dim can be different from the unit

In which case ? I am in trouble with the previous reference...

Moreover, it says,

LSTM in Keras only define exactly one LSTM block, whose cells is of unit-length.

Okay, but how do I define a full LSTM layer ? Is it the input_shape that implicitely create as many blocks as the number of time_steps (which, according to me is the first parameter of input_shape parameter in my piece of code ?

Thanks for lighting me

EDIT : would it also be possible to detail clearly how to reshape data of, say, size (n_samples, n_features) for a stateful LSTM model ? How to deal with time_steps and batch_size ?

score 6 · Answer 1 · answered May 15 '20 at 08:01

First, units in LSTM is NOT the number of time_steps.

Each LSTM cell(present at a given time_step) takes in input x and forms a hidden state vector a, the length of this hidden unit vector is what is called the units in LSTM(Keras).

You should keep in mind that there is only one RNN cell created by the code

keras.layers.LSTM(units, activation='tanh', …… )

and RNN operations are repeated by Tx times by the class itself.

I've linked this to help you understand it better in with a very simple code.

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

You can (sort of) think of it exactly as you think of fully connected layers. Units are neurons.

The dimension of the output is the number of neurons, as with most of the well known layer types.

The difference is that in LSTMs, these neurons will not be completely independent of each other, they will intercommunicate due to the mathematical operations lying under the cover.

Before going further, it might be interesting to take a look at this very complete explanation about LSTMs, its inputs/outputs and the usage of stative = true/false: Understanding Keras LSTMs. Notice that your input shape should be input_shape=(look_back, 1). The input shape goes for (time_steps, features).

While this is a series of fully connected layers:

hidden layer 1: 4 units
hidden layer 2: 4 units
output layer: 1 unit

This is a series of LSTM layers:

Where input_shape = (batch_size, arbitrary_steps, 3)

Each LSTM layer will keep reusing the same units/neurons over and over until all the arbitrary timesteps in the input are processed.

The output will have shape:
- (batch, arbitrary_steps, units) if return_sequences=True.
- (batch, units) if return_sequences=False.
The memory states will have a size of units.
The inputs processed from the last step will have size of units.

To be really precise, there will be two groups of units, one working on the raw inputs, the other working on already processed inputs coming from the last step. Due to the internal structure, each group will have a number of parameters 4 times bigger than the number of units (this 4 is not related to the image, it's fixed).

Flow:

Takes an input with n steps and 3 features
Layer 1:
- For each time step in the inputs:
  - Uses 4 units on the inputs to get a size 4 result
  - Uses 4 recurrent units on the outputs of the previous step
- Outputs the last (return_sequences=False) or all (return_sequences = True) steps
  - output features = 4
Layer 2:
- Same as layer 1
Layer 3:
- For each time step in the inputs:
  - Uses 1 unit on the inputs to get a size 1 result
  - Uses 1 unit on the outputs of the previous step
- Outputs the last (return_sequences=False) or all (return_sequences = True) steps

Thanks for this great explanation. Have you seen my question here : https://stackoverflow.com/questions/52071751/how-to-prepare-data-for-stateful-lstm-in-keras ? I would be glad if you could answer, as I did not find quality answer about it... Thanks a lot — MysteryGuy, Aug 29 '18 at 11:50

score 2 · Answer 3 · answered Aug 20 '18 at 14:59

2

The number of units is the size (length) of the internal vector states, h and c of the LSTM. That is no matter the shape of the input, it is upscaled (by a dense transformation) by the various kernels for the i, f, and o gates. The details of how the resulting latent features are transformed into h and c are described in the linked post. In your example, the input shape of data

(batch_size, timesteps, input_dim)

will be transformed to

(batch_size, timesteps, 4)

if return_sequences is true, otherwise only the last h will be emmited making it (batch_size, 4). I would recommend using a much higher latent dimension, perhaps 128 or 256 for most problems.

answered Aug 20 '18 at 14:59

modesitt

7,052
2
34
64

1

@MysteryGuy the link under LSTM. – modesitt Aug 20 '18 at 16:13
So, `h` and `c` are vectors of dim 1, right ? What about the representation in it : https://www.knowledgemapper.com/knowmap/knowbook/jasdeepchhabra94@gmail.comUnderstandingLSTMinTensorflow(MNISTdataset) : what do the units mean in that case ? – MysteryGuy Aug 21 '18 at 07:33
I am doubtful after [this](https://github.com/keras-team/keras/issues/7403#issuecomment-326797603) explanation which says that the `units` is the number of LSTM cells at that layer. Do you agree? – asn May 15 '20 at 12:42

score 0 · Answer 4 · answered Dec 27 '19 at 16:47

I would put it this way - there are 4 LSTM "neurons" or "units", each with 1 Cell State and 1 Hidden State for each timestep they process. So for an input of 1 timestep processing , you will have 4 Cell States, and 4 Hidden States and 4 Outputs.

Actually the correct way to say this is - for one timestep sized input you 1 Cell State (a vector of size 4) and 1 Hidden State (a vector of size 4) and 1 Output (a vector of size 4).

So if you feed in a timeseries with 20 steps, you will have 20 (intermediate) Cell States, each of size 4. That is because the inputs in LSTM are processed sequentially, 1 after the other. Similarly you will have 20 Hidden States, each of size 4.

Usually, your output will be the output of the LAST step (a vector of size 4). However in case you want the outputs of each intermediate step(remember you have 20 timesteps to process), you can make return_sequences = TRUE. In which case you will have 20 , 4 sized vectors each telling you what was the output when each of those steps got processed as those 20 inputs came one after the other.

In case when you put return_states = TRUE , you get the last Hidden State of size = 4 and last Cell State of size 4.

How to interpret clearly the meaning of the units parameter in Keras?

4 Answers4

While this is a series of fully connected layers:

This is a series of LSTM layers:

Linked