What is Vanishing Gradient?

Vanishing Gradient

Deep Learning

The vanishing gradient problem occurs when gradients become extremely small during backpropagation through many layers, making it difficult to train deep networks. Skip connections and normalization techniques were developed to address this issue.

Understanding Vanishing Gradient

The vanishing gradient problem occurs when gradients become exponentially smaller as they propagate backward through many layers during backpropagation, effectively preventing earlier layers from learning. This was a major obstacle in training deep neural networks, as the sigmoid function and tanh activation functions compress gradients into increasingly narrow ranges with each layer. The problem severely limited the depth of trainable networks for decades. Key solutions include the ReLU activation function, which maintains constant gradients for positive inputs, residual connections (skip connections) that provide gradient shortcuts as used in ResNet architectures, and gating mechanisms in LSTM networks. Batch normalization and careful weight initialization techniques like Xavier and He initialization also mitigate the problem. The resolution of vanishing gradients was a pivotal breakthrough enabling modern deep learning.

Variational Autoencoder

Back to glossary

Vanishing Gradient

Understanding Vanishing Gradient

Related in Deep Learning

Activation Function

Adam Optimizer

Adapter Layers

Attention Mechanism

Autoencoder

Backpropagation

Batch Normalization

Batch Size