Guided Flow Policy (GFP) is an offline RL method based on flow matching. It couples a multi-step flow-matching policy trained with value-aware behavior cloning and a distilled one-step actor through a bidirectional guidance mechanism. This enables GFP to achieve state-of-the-art performance across 144 state and pixel-based tasks from the OGBench, Minari, and D4RL benchmarks, with substantial gains on suboptimal datasets and challenging tasks.
This repository was forked from FQL, keeping the overall structure. Compared to the original code base:
- We added Guided Flow Policy in the agent folder.
- We changed the config management system to Hydra, for very convenient configs and command line overrides, see the usage section
- In the results folder, we share csv files of all our benchmarking results. In particular, it includes the 144 tasks GFP was evaluated on (see gfp_results), the extensive reevaluation of existing baselines (e.g. rebrac_results) on OGBench, and the first evaluation of GFP and FQL on Minari.
- In the hyperparams folder, we provide the exact hyperparameters used to generate these results in csv files (e.g. for gfp). For ease of reproduction, our main script recovers from these csv files the best hyperparameters for each task and method. Likewise, for pixel-based environments, it automatically selects the encoder, p_aug and frame_stack.
- We added an option to fuse several training steps together in the
jax.jitcompilation, thereby reducing the overhead (for this,dataset_on_gpu=Trueis needed). By default,n_fused_steps=4. - To reduce the evaluation overhead during training, we implemented a parallel evaluation function, using parallel Gym environments.
Create a Python environment if needed, for instance with conda:
conda create -n gfp python=3.12
conda activate gfpInstall the Jax version suited to your platform following Jax installation guide, for instance:
pip install "jax[cuda13]"Install the other requirements (by default works with OGBench and Minari)
pip install -r requirements.txtWe use Hydra to manage configs and command lines overrides. Given an env_name and an agent, the best hyperparameters are recovered from the hyperparams folder if available. Here are some example commands:
# By default: env_name=cube-double-play ; agent=gfp
python main.py
# To precise the environment and the agent
python main.py env_name=antmaze-large-navigate-singletask-task1-v0 agent=gfp
# For a Minari task, and some hyperparameters overrides
python main.py env_name='D4RL/pen/expert-v2' agent.alpha=0.3 agent.eta_temperature=0.000001 offline_steps=200000Using Hydra Multi-run one can sweep over environments, agents or any hyperparameter. Greatly helping hyperparameter search and any tests. Note: for this a hydra launcher may be needed, see Additional options/launcher.
# Sweep over two environments and two agents => launch 4 jobs
python main.py -m env_name=cube-triple-noisy-singletask-task1-v0,humanoidmaze-medium-navigate-singletask-task1-v0 agent=gfp,fql
# === Hyper parameter search ===
# Sweep over the alpha hyperparameter, using 2 seeds => 8 jobs
python main.py -m agent=gfp env_name=cube-double-play-singletask-v0 agent.alpha=3,1,0.3,0.1 seed=$RANDOM,$RANDOM
# Then sweep over the eta hyperparameter, using 2 seeds => 8 jobs
python main.py -m agent=gfp env_name=cube-double-play-singletask-v0 agent.eta_temperature=0.1,0.01,0.001,0.0001 seed=$RANDOM,$RANDOM agent.alpha=1
# Run all 5 sub tasks, each over 8 runs, to collect the final result
python main.py -m agent=gfp env_name=antmaze-large-navigate-singletask-task1-v0,antmaze-large-navigate-singletask-task2-v0,antmaze-large-navigate-singletask-task3-v0,antmaze-large-navigate-singletask-task4-v0,antmaze-large-navigate-singletask-task5-v0 seed=$RANDOM,$RANDOM,$RANDOM,$RANDOM,$RANDOM,$RANDOM,$RANDOM,$RANDOM is_final=TrueWe provide some optional hydra groups: launcher and opt:
- Launcher may be used to override hydra.launcher for multi-runs. We provide an example of how to launch multiple jobs using slurm on a cluster (needs
pip install hydra-submitit-launcher --upgrade). Hydra optional parameters are added with a+:
python main.py +launcher=our_slurm- Opt provides convenient shortcuts for overriding several arguments. For example, light_log.yaml speeds up computation time by reporting only the final success rate:
python main.py +opt=light_logCombining both, a quick hyperparameter search on a cluster can be done using:
python main.py -m +launcher=our_slurm +opt=light_log agent=gfp env_name=cube-double-play-singletask-v0 agent.alpha=3,1,0.3- 🟢 2025-12-03 - Release of the paper on ArXiv
- 🟢 2026-01-26 - Paper accepted at ICLR
- 🟢 2026-03-09 - Code released
@inproceedings{tiofack2026guided,
title = {Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning},
author = {Franki {Nguimatsia Tiofack} and Theotime {Le Hellard} and Fabian Schramm and Nicolas Perrin-Gilbert and Justin Carpentier},
booktitle = {The Fourteenth International Conference on Learning Representations},
year = {2026},
url = {https://openreview.net/forum?id=EBjy1rmpv0}
}