neural networks & ai
summer camp @ constructor school, together with Meri Grigoryan
The Vanishing Gradient Problem
by Sven Kriegel
This project aims to explore why vanishing gradients occur in deep neural networks, how they impact training, and strategies to mitigate them using calculus.
deliverables
1. Gradients Drive Learning
Explain gradient descent using calculus (derivatives, partial derivatives) and relate it to neural network weight updates.
Requirements
- Define gradients as direction of steepest ascent/descent using partial derivatives.
- Describe how gradients update weights to minimize loss.
- Include a simple numerical example (e.g., optimizing a quadratic function).
2. Why Gradients Vanish in Deep Networks
Explain the vanishing gradient problem using calculus (chain rule, activation functions) and AI architecture.
Requirements
- Show mathematically how repeated chain rule multiplication shrinks gradients (e.g. using sigmoid/\(\mathbf{tanh}\)) derivatives).
- Visualize how activation functions (sigmoid vs. \(\mathbf{ReLU}\)) affect gradient flow.
- Link vanishing gradients to slow training/ineffective deep networks.
3. Fixing the Vanishing Gradient
Analyze solutions (\(\mathbf{ResNet}\), LSTM, activation functions) and their real-world significance.
Requirements
- Describe what causes the vanishing gradient problem in deep neural networks.
- Explain how \(\mathbf{ResNet}\)’s skip connections help reduce this problem.1
- Compare \(\mathbf{ReLU}\) and sigmoid activation functions and why \(\mathbf{ReLU}\) works better for deep networks.
- Give one example of a real-world application where \(\mathbf{ResNet}\) or \(\mathbf{LSTM}\) is used, and explain why the solution helps.
Resources
-
Neural Networks by 3Blue1Brown. Videos 1–4.
-
Vanishing Gradient Problem. YouTube. Hint: summarize this!
-
Vanishing Gradient Problem in Deep Learning: Explained. Shaoni Mukherjee. Good math content
-
Gradient Decay in Neural Networks. Lucas on Stats StackExchange.
-
Bonus. Use simple examples or diagrams. ↩