2 The reading list
As of present, the name speaks for itself. It contains documents and interesting reading (finished of not) that are relevant per context of the laboratory works. The first few would rather be about double descent, because that is what the lab was initially tested of.
2.1 Learning theory
2.1.1 Double descent
Fairly a lot of things are there to read in double descent section.
- Deep double descent - where bigger models and more data hurts
- A model of double descent for high-dimensional binary linear classification.
- Deep Double Descent via Smooth Interpolation
- On the Role of Optimization in Double Descent: A Least Squares Study
- Two models of Double Descent for Weak Features
- To understand double descent, we need to understand VC theory
- Analysis of Interpolating Regression Models and the Double Descent Phenomenon
- A context-free grammar for peaks and double descents of permutations
- A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
- Understanding the Double Descent Phenomenon in Deep Learning
- Reconciling modern machine learning practice and the bias-variance trade-off
Most of them are done for now, with foundational papers being Belkin’s paper and the Deep Double Descent one, though that one is more nuanced on their take. The updated list contents are rather in line with the documented analysis,
- Reconciling modern machine-learning practice (Belkin et al.) – link here
- Deep Double Descent (Nakkiran et al.) – link here
- Surprises in High-Dimensional Ridgeless Least Squares Interpolation – link here
- Changing the Kernel During Training Leads to Double Descent in Kernel Regression – link here
- Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle – link here
- Multi-scale Feature Learning Dynamics: Insights for Double Descent – link here
- More Data Can Hurt for Linear Regression: Sample-wise Double Descent – link here
- On Double Descent in Reinforcement Learning with LSTD and Random Features – link here
- Homophily modulates double descent generalization in graph convolution networks – link here
- An Overview of Double Descent and Overparameterization – link here
- Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition – link here
- Kernel regression in high dimensions: Refined analysis beyond double descent – link here
- Double Descent: Understanding Linear Model Estimation of Nonidentifiable Parameters and a Model for Overfitting – link here
- Two models of double descent for weak features – link here
- Double Descent of Discrepancy: A Task-, Data-, and Model-Agnostic Phenomenon – link here
- Manipulating Sparse Double Descent – link here
- A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning – link here
- Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the role of model complexity – link here
- Dropout Drops Double Descent – link here
- The Double Descent Behavior in Two Layer Neural Network for Binary Classification – link here
- Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks – link here
- Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes – link here
- To understand double descent, we need to understand VC theory – link here
Some of which are already analyzed, including the reinforcement learning one. Most of them would be in the manuscript.