Discussion about this post

User's avatar
Daniel Popescu / ⧉ Pluralisk's avatar

This article comes at the perfect time; it's genuinely insightful to see someone question the PPO dogma when many of us have been wondering if the algorithmic overhead was truly justifed for LLM fine-tuning, or if we were just trying to fit a square peg in a robotics-shaped hole.

Expand full comment

No posts