This repo generates the data for NeoRL.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
Dilemma d891964b9d Update a new environment: waterworks. The multi-level policies are also in the models dir. Replace tianshou==0.4.2, provide PPO to get models and replace some implementation to be compatible. 4 weeks ago
offlinedata Update a new environment: waterworks. The multi-level policies are also in the models dir. Replace tianshou==0.4.2, provide PPO to get models and replace some implementation to be compatible. 4 weeks ago
.gitignore add data parse script 2 years ago
README.md Update a new environment: waterworks. The multi-level policies are also in the models dir. Replace tianshou==0.4.2, provide PPO to get models and replace some implementation to be compatible. 4 weeks ago
setup.py fix a bug 2 years ago

README.md

OfflineData

OfflineData is the repository to train policies and generate datasets for NeoRL benchmarks.

Install OfflineData

1. Install offlinedata

git clone https://agit.ai/Polixir/OfflineData.git
cd OfflineData
pip install -e .

2. Install neorl

Please install neorl for getting environments:

git clone https://agit.ai/Polixir/NeoRL.git
cd NeoRL
pip install -e .

3. Install tianshou

We use tianshou, a popular RL framework, to train the behavioral policies. Please install the tianshou through GitHub:

pip install tianshou==0.4.2

If you use mujoco environment, please make sure you install mujoco and mujoco_py.

Envs

You can use neorl to get all standardized environments, like:

import neorl

env = neorl.make("halfcheetah-meidum-v3")

env = neorl.make("citylearn")

You can use the following environments now:

Env Name observation shape action shape have done max timesteps
HalfCheetah-v3 18 6 False 1000
Hopper-v3 12 3 True 1000
Walker2d-v3 18 6 True 1000
ib 182 3 False 1000
finance 181 30 False 2516
citylearn 74 14 False 1000
waterworks 14 4 False 287

Usage

1.Train policy

python get_model.py --task env_name

The policy models labelled by trajectory return will be saved in models. Some models are pre-saved in the folder.

get_model.py uses SAC. You can also choose PPO by using get_model_ppo.py.

2.Sample data

You can also skip the first step and sample data use our pre-saved policy models.

(1) Deterministic policy sampling:

python get_data.py --task env_name

(2) stochastic policy sampling:

python get_data.py --task env_name --add_noise

The datasets will be saved in datasets.