2 The reading list

As of present, the name speaks for itself. It contains documents and interesting reading (finished of not) that are relevant per context of the laboratory works. The first few would rather be about double descent, because that is what the lab was initially tested of.

2.1 Learning theory

2.1.1 Double descent

Fairly a lot of things are there to read in double descent section.

Most of them are done for now, with foundational papers being Belkin’s paper and the Deep Double Descent one, though that one is more nuanced on their take. The updated list contents are rather in line with the documented analysis,

Reconciling modern machine-learning practice (Belkin et al.) – link here
Deep Double Descent (Nakkiran et al.) – link here
Surprises in High-Dimensional Ridgeless Least Squares Interpolation – link here
Changing the Kernel During Training Leads to Double Descent in Kernel Regression – link here
Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle – link here
Multi-scale Feature Learning Dynamics: Insights for Double Descent – link here
More Data Can Hurt for Linear Regression: Sample-wise Double Descent – link here
On Double Descent in Reinforcement Learning with LSTD and Random Features – link here
Homophily modulates double descent generalization in graph convolution networks – link here
An Overview of Double Descent and Overparameterization – link here
Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition – link here
Kernel regression in high dimensions: Refined analysis beyond double descent – link here
Double Descent: Understanding Linear Model Estimation of Nonidentifiable Parameters and a Model for Overfitting – link here
Two models of double descent for weak features – link here
Double Descent of Discrepancy: A Task-, Data-, and Model-Agnostic Phenomenon – link here
Manipulating Sparse Double Descent – link here
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning – link here
Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the role of model complexity – link here
Dropout Drops Double Descent – link here
The Double Descent Behavior in Two Layer Neural Network for Binary Classification – link here
Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks – link here
Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes – link here
To understand double descent, we need to understand VC theory – link here

Some of which are already analyzed, including the reinforcement learning one. Most of them would be in the manuscript.

# The reading list As of present, the name speaks for itself. It contains documents and interesting reading (finished of not) that are relevant per context of the laboratory works. The first few would rather be about *double descent*, because that is what the lab was initially tested of. ## Learning theory ### Double descent Fairly a lot of things are there to read in double descent section. - [Deep double descent - where bigger models and more data hurts](https://iopscience.iop.org/article/10.1088/1742-5468/ac3a74/meta) - [A model of double descent for high-dimensional binary linear classification](https://academic.oup.com/imaiai/article-abstract/11/2/435/6209694). - [Deep Double Descent via Smooth Interpolation](https://arxiv.org/abs/2209.10080) - [On the Role of Optimization in Double Descent: A Least Squares Study](https://proceedings.neurips.cc/paper_files/paper/2021/hash/f754186469a933256d7d64095e963594-Abstract.html) - [Two models of Double Descent for Weak Features](https://epubs.siam.org/doi/abs/10.1137/20M1336072) - [To understand double descent, we need to understand VC theory](https://www.sciencedirect.com/science/article/pii/S0893608023005658) - [Analysis of Interpolating Regression Models and the Double Descent Phenomenon](https://www.sciencedirect.com/science/article/pii/S2405896323004275) - [A context-free grammar for peaks and double descents of permutations](https://www.sciencedirect.com/science/article/pii/S0196885818300745) - [A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning](https://arxiv.org/abs/2310.18988) - [Understanding the Double Descent Phenomenon in Deep Learning](https://arxiv.org/abs/2403.10459) - [Reconciling modern machine learning practice and the bias-variance trade-off](https://arxiv.org/abs/1812.11118) Most of them are done for now, with foundational papers being Belkin's paper and the Deep Double Descent one, though that one is more nuanced on their take. The updated list contents are rather in line with the documented analysis, - Reconciling modern machine-learning practice (Belkin et al.) – link [here](https://www.pnas.org/doi/10.1073/pnas.1903070116) - Deep Double Descent (Nakkiran et al.) – link [here](https://arxiv.org/pdf/1912.02292) - Surprises in High-Dimensional Ridgeless Least Squares Interpolation – link [here](https://arxiv.org/pdf/1903.08560) - Changing the Kernel During Training Leads to Double Descent in Kernel Regression – link [here](https://arxiv.org/pdf/2311.01762) - Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle – link [here](https://arxiv.org/pdf/2303.14151) - Multi-scale Feature Learning Dynamics: Insights for Double Descent – link [here](https://arxiv.org/pdf/2112.03215) - More Data Can Hurt for Linear Regression: Sample-wise Double Descent – link [here](https://arxiv.org/pdf/1912.07242) - On Double Descent in Reinforcement Learning with LSTD and Random Features – link [here](https://openreview.net/pdf?id=9RIbNmx984) - Homophily modulates double descent generalization in graph convolution networks – link [here](https://www.pnas.org/doi/epdf/10.1073/pnas.2309504121) - An Overview of Double Descent and Overparameterization – link [here](https://akyrillidis.github.io/comp414-514/schedule/images/Proceedings2021_Part2.pdf?utm_source=chatgpt.com) - Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition – link [here](https://arxiv.org/pdf/2402.15175v1) - Kernel regression in high dimensions: Refined analysis beyond double descent – link [here](https://arxiv.org/pdf/2010.02681) - Double Descent: Understanding Linear Model Estimation of Nonidentifiable Parameters and a Model for Overfitting – link [here](https://arxiv.org/pdf/2408.13235) - Two models of double descent for weak features – link [here](https://arxiv.org/pdf/1903.07571) - Double Descent of Discrepancy: A Task-, Data-, and Model-Agnostic Phenomenon – link [here](https://arxiv.org/pdf/2305.15907) - Manipulating Sparse Double Descent – link [here](https://arxiv.org/pdf/2401.10686) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning – link [here](https://arxiv.org/pdf/2310.18988) - Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the role of model complexity – link [here](https://arxiv.org/pdf/2411.02184v1) - Dropout Drops Double Descent – link [here](https://arxiv.org/pdf/2305.16179) - The Double Descent Behavior in Two Layer Neural Network for Binary Classification – link [here](https://jds-online.org/journal/JDS/article/1420/info) - Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks – link [here](https://arxiv.org/pdf/2103.00500) - Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes – link [here](https://ojs.aaai.org/index.php/AAAI/article/view/29071) - To understand double descent, we need to understand VC theory – link [here](https://pdf.sciencedirectassets.com/271125/1-s2.0-S0893608023X00100/1-s2.0-S0893608023005658/am.pdf) Some of which are already analyzed, including the *reinforcement learning* one. Most of them would be in the manuscript.