Recurrent Neural Nets

→ Recurrent neuron is different from dense layer. It does all the same things as a dense neuron, but with a twist: it also receives its own output from the previous step as an input.

Screenshot 2025-08-04 032824.png

→ Because an RNN processes sequences, we need a way to represent that sequence data. That's where the 3D input shape (batch_size, time_steps, features) comes from.

features (Dimension 3): This is the easiest. It's "how many data points describe the input at a single point in time?"

If you're only using the daily closing price, features =

If you're using the closing price, opening price, and volume, features = 3.
If you're processing text and each word is represented by a 100-dimensional word embedding, features = 100.

time_steps (Dimension 2): This is the length of your sequence. It's "how many consecutive points in time are you looking at to make one prediction?"

If you want to predict tomorrow's stock price by looking at the last 10 days, then time_steps = 10.
If you want to predict the next word by looking at the last 5 words, then time_steps = 5.
The recurrent neuron's internal loop runs for time_steps iterations for each sequence you show it.

batch_size (Dimension 1): This is just for efficiency. It's "how many separate sequences are you feeding the network at once?"

If you're training on 32 different 10-day stock histories at the same time, your batch_size = 32.

So, an input shape of (32, 10, 1) for our stock example means: "The network is processing 32 different sequences, each 10 days long (time steps), where each day is described by 1 feature (the closing price)."

→ Single Neuron vs. Layer of Neurons (And Multiple Outputs)

A single recurrent neuron would process the sequence (e.g., 10 time steps) and at each step, it would output a single number (a scalar).

A layer of recurrent neurons (e.g., SimpleRNN(units=64)) is just a collection of these neurons working in parallel. So, instead of one neuron looping, you have 64 independent neurons. At each time step, each of the 64 neurons takes the input and its own previous state, and each produces one output number.

Therefore, the output of the layer at a single time step is not a single number, but a vector of 64 numbers.

This is where the "multiple outputs" idea comes in, controlled by a key parameter: return_sequences.

return_sequences=False (Default): The RNN layer will run for all the time steps, but it will only give you the output from the very last time step. It throws away the outputs from t=1, t=2, t=3... and just gives you the final one. The output is a 2D tensor: (batch_size, units). It's a "summary" of the whole sequence.