|Yi 5b5962658e||2 months ago|
|benchmark||3 months ago|
|neorl||3 months ago|
|.gitignore||7 months ago|
|.gitmodules||3 months ago|
|LICENSE||3 months ago|
|README.md||3 months ago|
|demo.py||6 months ago|
|setup.py||3 months ago|
This repository is the interface for the offline reinforcement learning benchmark NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning.
The NeoRL benchmark contains environments, datasets, and reward functions for training and benchmarking offline reinforcement learning algorithms. Current benchmark contains environments of CityLearn, FinRL, IB, and three Gym-MuJoCo tasks.
More about the NeoRL benchmark can be found at http://polixir.ai/research/neorl and the following paper
Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, Yang Yu. NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning. https://arxiv.org/abs/2102.00714
NeoRL interface can be installed as follows:
git clone https://agit.ai/Polixir/neorl.git cd neorl pip install -e .
After installation, CityLearn, Finance, and the industrial benchmark will be available. If you want to leverage MuJoCo in your tasks, it is necessary to obtain a license and follow the setup instructions, and then run:
pip install -e .[mujoco]
So far "HalfCheetah-v3", "Walker2d-v3", and "Hopper-v3" are supported within MuJoCo.
import neorl # Create an environment env = neorl.make("citylearn") env.reset() env.step(env.action_space.sample()) # Get 100 trajectories of low level policy collection on citylearn task train_data, val_data = env.get_dataset(data_type = "low", train_num = 100)
To facilitate setting different goals, users can provide custom reward function to
neorl.make() while creating an env. See usage and examples of
neorl.make() for more details.
As a benchmark, in order to test algorithms conveniently and quickly, each task is associated
with a small training dataset and a validation dataset by default. They can be obtained by
env.get_dataset(). Meanwhile, for flexibility, extra parameters can be passed into
to get multiple pairs of datasets for benchmarking. Each task collects data using a low, medium,
or high level policy; for each task, we provide training data for a maximum of 10000 trajectories.
See usage of
get_dataset() for more details about parameter usage.
In NeoRL, training data and validation data returned by
get_dataset() function are
dict with the same format:
obs: An N by observation dimensional array of current step's observation.
next_obs: An N by observation dimensional array of next step's observation.
action: An N by action dimensional array of actions.
reward: An N dimensional array of rewards.
done: An N dimensional array of episode termination flags.
index: An trajectory number-dimensional array.
The numbers in index indicate the beginning of trajectories.