Reinforcement Learning via Value Gradient Flow

1UT Austin 2UC Berkeley
* Equal contribution

TLDR: scalable and sample-efficient RL finetuning with generative models using Value Gradient Flow.