Reinforced

Reinforced

Home
Archive
About
Bandits vs Reinforcement Learning from Human Feedback
Are single-step or multi-step models better suited for RLHF? Find out the main differences between them in this post
Apr 30, 2024 • 
Alex Nikulkov
4
Reward Model Overoptimization: Root Causes and Mitigations
When I first ran an RLHF training job, I was surprised at how easily the reward model scores increased during the training process.
Apr 7, 2024 • 
Alex Nikulkov
1

January 2024

Reward Modeling for RLHF
An introduction to reward models
Jan 10, 2024 • 
Alex Nikulkov
1
Hello World
Welcome to Reinforced!
Jan 7, 2024 • 
Alex Nikulkov
© 2025 Alex Nikulkov
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture