Python interface for accessing the near real-world offline reinforcement learning (NeoRL) benchmark datasets
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Dilemma 64629d5b14 revise readme 2 weeks ago
benchmark add the SalesPromotion task, update submodule to support it and update the raw results of online, and two offline evalution methods. The results have included the newly added SalesPromotion env. 4 months ago
neorl add a get method 4 months ago
.gitignore change name from newrl to neorl 2 years ago
.gitmodules add instructions for reproducing the benchmark 1 year ago
LICENSE add licenses 1 year ago revise readme 2 weeks ago update v2 training dataset 2 years ago support pip install from git 1 year ago


License License

This repository is the interface for the offline reinforcement learning benchmark NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning.

The NeoRL repository contains datasets for training, tools for validation and corresponding environments for testing the trained policies. Current datasets are collected from three open-source environments, i.e., CityLearn, FinRL, IB, and three Gym-MuJoCo tasks. We use SAC to train on each of these domains, and then use policies around 25%, 50% and 75% of the highest episode return to generate three-level quality of datasets respectively for each task. Since the action spaces of these domains are continuous, the policy output is the mean and stdev of a Gaussian distribution. During data collection, with 80% chance we take the mean of the Gaussian policy and with 20% probability to sample from the trained policies to reflect the mistakes of human operators in real-world systems. The entire datasets can be reproduced with this repo. Besides, we also provide a sales promotion task.

More about the NeoRL benchmark can be found at and the following paper

Rong-Jun Qin, Songyi Gao, Xingyuan Zhang, Xiong-Hui Chen, Zewen Li, Weinan Zhang, Yang Yu. NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning.

is now accessible at

The benchmark is supported by two addtional repos, i.e. OfflineRL for training offline RL algorithms and d3pe for offline evaluation. Details for reproducing the benchmark can be found at here.

Install NeoRL interface

NeoRL interface can be installed as follows:

git clone
cd neorl
pip install -e .

After installation, CityLearn, Finance, the industrial benchmark and the sales promotion environments will be available. If you want to leverage MuJoCo in your tasks, it is necessary to obtain a license and follow the setup instructions, and then run:

pip install -e .[mujoco]

So far "HalfCheetah-v3", "Walker2d-v3", and "Hopper-v3" are supported within MuJoCo.

Using NeoRL

NeoRL uses the OpenAI Gym API. Tasks are created via the neorl.make function. A full list of all tasks is available here.

import neorl

# Create an environment
env = neorl.make("citylearn")

# Get 100 trajectories of low level policy collection on citylearn task
train_data, val_data = env.get_dataset(data_type = "low", train_num = 100)

To facilitate setting different goals, users can provide custom reward function to neorl.make() while creating an env. See usage and examples of neorl.make() for more details.

As a benchmark, in order to test algorithms conveniently and quickly, each task is associated with a small training dataset and a validation dataset by default. They can be obtained by env.get_dataset(). Meanwhile, for flexibility, extra parameters can be passed into get_dataset() to get multiple pairs of datasets for benchmarking. Each task collects data using a low, medium, or high level policy; for each task, we provide training data for a maximum of 10000 trajectories. See usage of get_dataset() for more details about parameter usage.

Data in NeoRL

In NeoRL, training data and validation data returned by get_dataset() function are dict with the same format:

  • obs: An N by observation dimensional array of current step's observation.

  • next_obs: An N by observation dimensional array of next step's observation.

  • action: An N by action dimensional array of actions.

  • reward: An N dimensional array of rewards.

  • done: An N dimensional array of episode termination flags.

  • index: An trajectory number-dimensional array. The numbers in index indicate the beginning of trajectories.


  • CityLearn: Vázquez-Canteli J R, Kämpf J, Henze G, et al. "CityLearn v1.0: An OpenAI Gym Environment for Demand Response with Deep Reinforcement Learning." Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pp. 356-357, 2019. paper code
  • FinRL: Liu X Y, Yang H, Chen Q, et al. "FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance." arXiv preprint arXiv:2011.09607, 2020. paper code
  • Industrial Benchmark: Hein D, Depeweg S, Tokic M, et al. "A Benchmark Environment Motivated by Industrial Control Problems." Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence, pp. 1-8, 2017. paper code
  • MuJoCo: Todorov E, Erez T, Tassa Y. "Mujoco: A Physics Engine for Model-based Control." Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026-5033, 2012. paper website


All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY), and code is licensed under the Apache 2.0 License.