Reinforced

Reinforced

Home
Archive
About
Positive Gradients, Negative Gradients
+ the Importance of Pre-Training Priors
Dec 19, 2025 • Alex Nikulkov

April 2024

Bandits vs Reinforcement Learning from Human Feedback
Are single-step or multi-step models better suited for RLHF? Find out the main differences between them in this post
Apr 30, 2024 • Alex Nikulkov
Reward Model Overoptimization: Root Causes and Mitigations
When I first ran an RLHF training job, I was surprised at how easily the reward model scores increased during the training process.
Apr 7, 2024 • Alex Nikulkov

January 2024

Reward Modeling for RLHF
An introduction to reward models
Jan 10, 2024 • Alex Nikulkov
Hello World
Welcome to Reinforced!
Jan 7, 2024 • Alex Nikulkov
© 2026 Alex Nikulkov · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture