Introduction

In this tutorial, we will show how to use PyTupli to set up an efficient pipeline for offline reinforcement learning (RL) for a custom environment. This includes

  • creating a benchmark and uploading it to an instance of a TupliStorage,

  • re-loading this benchmark from the storage,

  • recording RL tuples of (state, action, reward, done) for this benchmark and uploading them to the storage,

  • creating a dataset from the stored episodes, and

  • training an offline RL agent using d3rlpy.

You can skip the last part, but if you want to try that, you have to install the d3rlpy library using pip install d3rlpy.

[1]:
import io
import os
from pathlib import Path
import tempfile
import math
from typing import Optional

import numpy as np
import pandas as pd

from gymnasium import spaces
from gymnasium.envs.classic_control import utils, MountainCarEnv
from gymnasium.wrappers import TimeLimit

import d3rlpy
from d3rlpy.algos import DiscreteCQLConfig
from d3rlpy.dataset import MDPDataset

from pytupli.benchmark import TupliEnvWrapper
from pytupli.storage import TupliAPIClient, TupliStorage, FileStorage
from gymnasium import Env
from pytupli.schema import ArtifactMetadata, FilterEQ, EpisodeMetadataCallback
from pytupli.dataset import TupliDataset, NumpyTupleParser


# locate repository / tutorials folder robustly
def find_repo_root():
    p = Path.cwd()
    for parent in [p] + list(p.parents):
        if (parent / '.git').exists():
            return parent
    try:
        import pytupli

        return Path(pytupli.__file__).resolve().parent.parent
    except Exception:
        return Path.cwd()


REPO_ROOT = find_repo_root()
TUTORIALS_ROOT = REPO_ROOT / 'docs' / 'source' / 'tutorials'
# canonical data / model locations (relative to tutorials folder)
DATA_DIR = TUTORIALS_ROOT / 'data'
DATA_PATH = DATA_DIR / 'wind_data.csv'
# fall back to strings when passing into APIs that expect str
DATA_PATH = str(DATA_PATH)
/home/hannah/anaconda3/envs/pytupli_env/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2026-04-15 10:35.09 [info     ] Register Shimmy environments. 

PyTupli has two storage options: A local FileStorage and using MongoDB as a backend in the TupliAPIClient. You can run this notebook with both storage types by adjusting the flag below. If you want to use the TupliAPIClient, follow the instructions in the Readme to start the application.

[2]:
STORAGE_FLAG = 'file'  # "api"

Creating a Custom Environment

We will use the MountainCar example from gymnasium with a small modification: The cart is slowed down by wind in the horizontal direction. We load the wind data from a csv file.

[3]:
class CustomMountainCarEnv(MountainCarEnv):
    """
    ## Description

    The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically
    at the bottom of a sinusoidal valley, with the only possible actions being the accelerations
    that can be applied to the car in either direction. The goal of the MDP is to strategically
    accelerate the car to reach the goal state on top of the right hill. There are two versions
    of the mountain car domain in gymnasium: one with discrete actions and one with continuous.
    This version is the one with discrete actions.

    This MDP first appeared in [Andrew Moore's PhD Thesis (1990)](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-209.pdf)

    ```
    @TECHREPORT{Moore90efficientmemory-based,
        author = {Andrew William Moore},
        title = {Efficient Memory-based Learning for Robot Control},
        institution = {University of Cambridge},
        year = {1990}
    }
    ```

    ## Observation Space

    The observation is a `ndarray` with shape `(2,)` where the elements correspond to the following:

    | Num | Observation                          | Min   | Max  | Unit         |
    |-----|--------------------------------------|-------|------|--------------|
    | 0   | position of the car along the x-axis | -1.2  | 0.6  | position (m) |
    | 1   | velocity of the car                  | -0.07 | 0.07 | velocity (v) |

    ## Action Space

    There are 3 discrete deterministic actions:

    - 0: Accelerate to the left
    - 1: Don't accelerate
    - 2: Accelerate to the right

    ## Transition Dynamics:

    Given an action, the mountain car follows the following transition dynamics:

    *velocity<sub>t+1</sub> = velocity<sub>t</sub> + (action - 1) * force - cos(3 * position<sub>t</sub>) * gravity*

    *position<sub>t+1</sub> = position<sub>t</sub> + velocity<sub>t+1</sub>*

    where force = 0.001 and gravity = 0.0025. The collisions at either end are inelastic with the velocity set to 0
    upon collision with the wall. The position is clipped to the range `[-1.2, 0.6]` and
    velocity is clipped to the range `[-0.07, 0.07]`.

    ## Reward:

    The goal is to reach the flag placed on top of the right hill as quickly as possible, as such the agent is
    penalised with a reward of -1 for each timestep.

    ## Starting State

    The position of the car is assigned a uniform random value in *[-0.6 , -0.4]*.
    The starting velocity of the car is always assigned to 0.

    ## Episode End

    The episode ends if either of the following happens:
    1. Termination: The position of the car is greater than or equal to 0.5 (the goal position on top of the right hill)
    2. Truncation: The length of the episode is 200.

    ## Arguments

    Mountain Car has two parameters for `gymnasium.make` with `render_mode` and `goal_velocity`.
    On reset, the `options` parameter allows the user to change the bounds used to determine the new random state.

    ```python
    >>> import gymnasium as gym
    >>> env = gym.make("MountainCar-v0", render_mode="rgb_array", goal_velocity=0.1)  # default goal_velocity=0
    >>> env
    <TimeLimit<OrderEnforcing<PassiveEnvChecker<MountainCarEnv<MountainCar-v0>>>>>
    >>> env.reset(seed=123, options={"x_init": np.pi/2, "y_init": 0.5})  # default x_init=np.pi, y_init=1.0
    (array([-0.46352962,  0.        ], dtype=float32), {})

    ```

    ## Version History

    * v0: Initial versions release
    """

    metadata = {
        'render_modes': ['human', 'rgb_array'],
        'render_fps': 30,
    }

    def __init__(self, data_path: str, render_mode: Optional[str] = None, goal_velocity=0):
        self.min_position = -1.2
        self.max_position = 0.6
        self.max_speed = 0.07
        self.goal_position = 0.5
        self.goal_velocity = goal_velocity
        self.current_step = 0
        self.data = pd.read_csv(data_path, index_col=0, header=None) * 0.01

        self.force = 0.001
        self.gravity = 0.0025

        self.low = np.array([self.min_position, -self.max_speed], dtype=np.float32)
        self.high = np.array([self.max_position, self.max_speed], dtype=np.float32)

        self.render_mode = render_mode

        self.screen_width = 600
        self.screen_height = 400
        self.screen = None
        self.clock = None
        self.isopen = True

        self.action_space = spaces.Discrete(3)
        self.observation_space = spaces.Box(self.low, self.high, dtype=np.float32)

    def step(self, action: int):
        assert self.action_space.contains(action), f'{action!r} ({type(action)}) invalid'

        position, velocity = self.state
        velocity += (
            (action - 1) * self.force
            + math.cos(3 * position) * (-self.gravity)
            - self.data.loc[self.current_step].to_numpy().flatten()[0] * math.cos(position)
        )
        velocity = np.clip(velocity, -self.max_speed, self.max_speed)
        position += velocity
        position = np.clip(position, self.min_position, self.max_position)
        if position == self.min_position and velocity < 0:
            velocity = 0

        terminated = bool(position >= self.goal_position and velocity >= self.goal_velocity)
        reward = -1.0
        self.current_step += 1

        self.state = (position, velocity)
        if self.render_mode == 'human':
            self.render()
        # truncation=False as the time limit is handled by the `TimeLimit` wrapper added during `make`
        return np.array(self.state, dtype=np.float32), reward, terminated, False, {}

    def reset(
        self,
        *,
        seed: Optional[int] = None,
        options: Optional[dict] = None,
    ):
        super().reset(seed=seed)
        # Note that if you use custom reset bounds, it may lead to out-of-bound
        # state/observations.
        self.current_step = 0
        low, high = utils.maybe_parse_reset_bounds(options, -0.6, -0.4)
        self.state = np.array([self.np_random.uniform(low=low, high=high), 0])

        if self.render_mode == 'human':
            self.render()
        return np.array(self.state, dtype=np.float32), {}

Serialize Environment for Upload

As a next step, we want to upload our environment to our storage using PyTupli. For this, we will detach the csv file from the environment, upload it seperately, and replace the data attribute in the environment with the id of the stored object. This allows us to re-use artifacts such as csv files in multiple benchmarks. For example, consider a case where you only want to change one parameter within the environment, e.g., the maximum speed. You would have to create a new benchmark, but could re-use the csv file! PyTupli automatically recognizes such duplicates.

To separate the csv file, we have to subclass the TupliEnvWrapper class and overwrite the _serialize() and _deserialize() members. The TupliEnvWrapper is essentially a gymnasium wrapper that records RL tuples in the step() function, but it has a lot of extra functionalities for interacting with the storage.

[4]:
class MyTupliEnvWrapper(TupliEnvWrapper):
    def _serialize(self, env) -> Env:
        related_data_sources = []
        ds = env.unwrapped.data
        metadata = ArtifactMetadata(name='test')
        data_kwargs = {'header': None}
        try:
            content = ds.to_csv(encoding='utf-8', **data_kwargs)
            content = content.encode(encoding='utf-8')
        except Exception as e:
            raise ValueError(f'Failed to serialize data source: {e}')

        ds_storage_metadata = self.storage.store_artifact(artifact=content, metadata=metadata)
        related_data_sources.append(ds_storage_metadata.id)
        setattr(env.unwrapped, 'data', ds_storage_metadata.id)
        return env, related_data_sources

    @classmethod
    def _deserialize(cls, env: Env, storage: TupliStorage) -> Env:
        data_kwargs = {'header': None, 'index_col': 0}
        ds = storage.load_artifact(env.unwrapped.data)
        ds = ds.decode('utf-8')
        d = io.StringIO(ds)
        df = pd.read_csv(d, **data_kwargs)

        env.unwrapped.data = df
        return env
[5]:
# which storage to use
if STORAGE_FLAG == 'api':
    storage = TupliAPIClient()
elif STORAGE_FLAG == 'file':
    storage = FileStorage()
else:
    raise ValueError(f"Unknown storage flag: {STORAGE_FLAG}. Has to be 'api' or 'file'.")
[6]:
# instantiate the environment
max_eps_length = 999
env = TimeLimit(
    CustomMountainCarEnv(render_mode=None, data_path=DATA_PATH), max_episode_steps=max_eps_length
)
# Now we can create the benchmark
tupli_env = MyTupliEnvWrapper(env, storage=storage)

Uploading and Downloading Benchmarks

We will now upload the benchmark and download it again.

[7]:
tupli_env.store(name='mountain-car-v0', description='Mountain Car v0 benchmark')

Let us list the uploaded benchmarks:

[8]:
storage.list_benchmarks()
[8]:
[BenchmarkHeader(id='ca2fd26d462d465d8ee41d64b042c2a1', created_by='local_storage', published_in=['local_storage'], created_at='2026-04-15T10:35:17.129690', hash='7f9d9e861a6eebd7627f825d2bc8ca8ae0a6d300db4a193998df01d511c08019', metadata=BenchmarkMetadata(name='mountain-car-v0', description='Mountain Car v0 benchmark', difficulty=None, version=None, extra={}))]

As a next step, we show how to download the benchmark. Note that this is only for demonstration purposes! When loading the benchmark, we can pass a callback that will later be used to add metadate to recorded episodes. We provide a simple example of such a function.

[9]:
class MyCallback(EpisodeMetadataCallback):
    def __init__(self):
        super().__init__()
        # we will compute the cumulative reward for an episode
        self.cum_reward = 0
        # Furthermore, we want to store the fact that the episode was not an expert episode
        self.is_expert = False

    def reset(self):
        # we will compute the cumulative reward for an episode
        self.cum_reward = 0

    def __call__(self, tuple):
        self.cum_reward += tuple.reward
        return {'cum_eps_reward': [self.cum_reward], 'is_expert': self.is_expert}
[10]:
loaded_tupli_env = MyTupliEnvWrapper.load(
    storage=storage, benchmark_id=tupli_env.id, metadata_callback=MyCallback()
)

Recording Episodes for Offline RL Training

The TupliEnvWrapper wrapper allows us to record all interactions with the custom environment to the storage in form of tuples (state, action, reward, terminal, timeout). This can then be used for training an offline RL agent for this environment using any offline RL library. For simplicity, we will use a random policy to generate the data.

[11]:
# For reproducibility when generating episodes
np.random.seed(42)
obs, info = loaded_tupli_env.reset(seed=42)

for step in range(2000):
    action = np.int64(np.random.randint(low=0, high=3))
    obs, reward, done, truncated, info = loaded_tupli_env.step(action)
    if done or truncated:
        print(f'Episode finished after {step + 1} timesteps')
        obs, info = loaded_tupli_env.reset()
Episode finished after 999 timesteps
Episode finished after 1998 timesteps

Downloading Episodes for a Benchmark

Next, let us download all episodes that have been recorded for our benchmark. For this, we create a TupliDataset using a filter with the id of the benchmark.

[12]:
# Create dataset
mdp_dataset = TupliDataset(storage=storage).with_benchmark_filter(
    FilterEQ(key='id', value=loaded_tupli_env.id)
)
mdp_dataset.load()

We can show the contents of the dataset using the preview() method:

[13]:
mdp_dataset.preview()
[13]:
[EpisodeItem(id='d5f8099653ac429d9f901a03b6fa5e70', created_by='local_storage', published_in=['local_storage'], created_at='2026-04-15T10:35:26.637788', benchmark_id='ca2fd26d462d465d8ee41d64b042c2a1', metadata={'cum_eps_reward': [-999.0], 'is_expert': False}, n_tuples=999, terminated=False, timeout=True, tuples=[RLTuple(state=[-0.5113905072212219, 0.0008338160114362836], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5116723775863647, -0.0002818846551235765], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5110931396484375, 0.0005792493466287851], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5096522569656372, 0.0014408753486350179], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5093242526054382, 0.000327992340316996], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5101639032363892, -0.0008396401535719633], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5101851224899292, -2.124540878867265e-05], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5103291869163513, -0.0001440104970242828], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.509620726108551, 0.0007084066164679825], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5080977082252502, 0.0015230465214699507], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5057256817817688, 0.002372018527239561], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5025493502616882, 0.0031763564329594374], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.500569760799408, 0.0019795973785221577], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4977911710739136, 0.0027785508427768946], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4952910244464874, 0.002500171773135662], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4940441846847534, 0.001246814732439816], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4930240511894226, 0.0010201389668509364], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4923064112663269, 0.0007176604704000056], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49190738797187805, 0.00039900271804071963], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49182963371276855, 7.777586142765358e-05], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4930543899536133, -0.0012247838312759995], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4955599009990692, -0.002505498006939888], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49830546975135803, -0.0027455675881356], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5013017058372498, -0.002996222348883748], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5055070519447327, -0.004205367062240839], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.510878324508667, -0.005371248349547386], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5173525810241699, -0.006474245805293322], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5229570269584656, -0.005604492034763098], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5276248455047607, -0.004667783621698618], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5312625169754028, -0.003637672169134021], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5348554253578186, -0.0035929519217461348], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5374233722686768, -0.0025679245591163635], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5399391651153564, -0.002515781205147505], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5423727631568909, -0.0024336320348083973], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5437002182006836, -0.0013273990480229259], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5449283719062805, -0.0012281722156330943], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5450786352157593, -0.00015025046013761312], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.544078528881073, 0.0010000934125855565], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5440093278884888, 6.916956044733524e-05], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5428723692893982, 0.0011369569692760706], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5426400899887085, 0.00023230024089571089], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5412861108779907, 0.0013539785286411643], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5388557314872742, 0.002430384047329426], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5373508334159851, 0.0015049121575430036], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5367832183837891, 0.0005675791180692613], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5351337194442749, 0.0016495289746671915], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5334744453430176, 0.001659264788031578], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5327448844909668, 0.0007295446703210473], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5320248603820801, 0.0007200646796263754], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5313005447387695, 0.0007242888677865267], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5305811762809753, 0.0007193594938144088], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5308516025543213, -0.0002704005455598235], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5310946106910706, -0.00024304342514369637], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5323395729064941, -0.0012449665227904916], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5335495471954346, -0.0012099361047148705], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5337236523628235, -0.00017413827299606055], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5328255295753479, 0.0008981470600701869], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5328859686851501, -6.0470742027973756e-05], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5318993926048279, 0.0009865766623988748], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5298661589622498, 0.0020332620479166508], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5277872681617737, 0.0020788954570889473], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5267308354377747, 0.0010564220137894154], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5256749391555786, 0.00105590361636132], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5246210098266602, 0.001053902436979115], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5236331224441528, 0.0009878887794911861], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5226479768753052, 0.0009851761860772967], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5217301249504089, 0.0009178569889627397], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5208677649497986, 0.0008623460889793932], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5200468301773071, 0.0008209246443584561], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5202727913856506, -0.00022597238421440125], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5195737481117249, 0.0006990367546677589], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5189687013626099, 0.000605061708483845], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5184198617935181, 0.0005488255410455167], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5179292559623718, 0.0004906049580313265], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5175549387931824, 0.00037433297256939113], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.517245352268219, 0.000309614697471261], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5170034170150757, 0.0002419227676000446], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.515887975692749, 0.0011154473759233952], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5138584971427917, 0.0020294473506510258], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5119637846946716, 0.0018947572680190206], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5091778039932251, 0.0027859758120030165], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5075864791870117, 0.0015912812668830156], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5061923265457153, 0.0013941703364253044], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5060040950775146, 0.00018822852871380746], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5070022344589233, -0.0009981263428926468], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5081721544265747, -0.001169910072349012], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5085447430610657, -0.0003726317372638732], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5100646615028381, -0.0015199058689177036], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5117618441581726, -0.0016971792792901397], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5146201848983765, -0.002858339808881283], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5185853838920593, -0.003965205978602171], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.523647129535675, -0.005061722360551357], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5297783017158508, -0.0061311605386435986], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5348930358886719, -0.005114757921546698], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5409407019615173, -0.006047682370990515], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5479172468185425, -0.006976546719670296], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5557628870010376, -0.00784562062472105], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5624416470527649, -0.006678763311356306], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5699036121368408, -0.0074619329534471035], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5780211091041565, -0.008117523044347763], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5847542881965637, -0.006733175832778215], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5900659561157227, -0.005311679095029831], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5938848853111267, -0.003818926867097616], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5982547402381897, -0.00436982698738575], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6011183857917786, -0.0028636441566050053], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6024767756462097, -0.0013584232656285167], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.604319155216217, -0.0018423564033582807], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6045958995819092, -0.00027673502336256206], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6053089499473572, -0.0007130824378691614], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6054692268371582, -0.0001602756092324853], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6040581464767456, 0.001411092234775424], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6020604968070984, 0.0019976769108325243], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6005573868751526, 0.0015030945651233196], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5975644588470459, 0.0029929226730018854], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5950524210929871, 0.0025120568461716175], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5920725464820862, 0.002979846205562353], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5896499156951904, 0.0024226270616054535], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5858072638511658, 0.0038426443934440613], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5805745124816895, 0.005232765804976225], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5749903917312622, 0.0055841002613306046], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5700504183769226, 0.00493996636942029], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5637997388839722, 0.006250693462789059], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5573214292526245, 0.0064783054403960705], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5496217608451843, 0.007699707057327032], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5407589673995972, 0.008862749673426151], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5328400135040283, 0.007918983697891235], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5239173173904419, 0.008922661654651165], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5160330533981323, 0.007884295657277107], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5072849988937378, 0.008748052641749382], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49874258041381836, 0.00854241568595171], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48947420716285706, 0.009268364869058132], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4815200865268707, 0.007954118773341179], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47492921352386475, 0.006590870674699545], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4687783718109131, 0.00615086080506444], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46207600831985474, 0.006702371872961521], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45485618710517883, 0.007219808176159859], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44819074869155884, 0.006665448192507029], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44116243720054626, 0.007028302177786827], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4338301122188568, 0.007332322653383017], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4281710684299469, 0.005659053102135658], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4222997725009918, 0.005871289409697056], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4161776304244995, 0.006122150458395481], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4109116792678833, 0.005265939515084028], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4064755439758301, 0.004436126444488764], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4039517939090729, 0.002523773582652211], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40135127305984497, 0.0026004991959780455], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3986596167087555, 0.0026916645001620054], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39588677883148193, 0.002772834151983261], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3951275944709778, 0.0007591942558065057], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3963949978351593, -0.0012674005702137947], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39862439036369324, -0.0022293967194855213], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4028591811656952, -0.004234800580888987], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.407021164894104, -0.004161973483860493], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41105911135673523, -0.004037961363792419], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.416995644569397, -0.00593652855604887], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42280468344688416, -0.0058090281672775745], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.428392618894577, -0.005587928928434849], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43571794033050537, -0.007325336802750826], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4447041451931, -0.008986195549368858], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4533352553844452, -0.008631102740764618], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4615289270877838, -0.008193697780370712], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4691871106624603, -0.007658177521079779], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47730588912963867, -0.008118769153952599], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.485789954662323, -0.00848408229649067], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.494637131690979, -0.008847154676914215], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5047628283500671, -0.01012569759041071], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5150337219238281, -0.010270931757986546], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5263786911964417, -0.011344928294420242], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5387546420097351, -0.012375985272228718], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.551028311252594, -0.012273619882762432], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5631109476089478, -0.01208264660090208], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5749309062957764, -0.011819968931376934], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5854355096817017, -0.010504623875021935], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5955489873886108, -0.010113446041941643], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6041737794876099, -0.008624814450740814], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6132490038871765, -0.009075222536921501], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6207288503646851, -0.007479832042008638], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6275708675384521, -0.006842003203928471], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6347057819366455, -0.007134954910725355], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6420928239822388, -0.007387015037238598], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6496621370315552, -0.007569336332380772], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6553828716278076, -0.0057207439094781876], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6601834893226624, -0.004800579976290464], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6649965643882751, -0.004813105333596468], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6698190569877625, -0.004822499584406614], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6746217608451843, -0.004802709445357323], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6773593425750732, -0.0027375577483326197], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6780165433883667, -0.0006571953999809921], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6775906682014465, 0.0004258534754626453], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6750766634941101, 0.00251401262357831], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6725336313247681, 0.0025430740788578987], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6689592003822327, 0.003574422327801585], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6653852462768555, 0.003573961555957794], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6618319153785706, 0.0035532775800675154], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6562931537628174, 0.00553875882178545], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6498505473136902, 0.00644261809065938], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6415277719497681, 0.008322770707309246], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6313640475273132, 0.010163730010390282], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6204244494438171, 0.010939628817141056], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6098325848579407, 0.010591857135295868], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5996437072753906, 0.010188865475356579], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5888920426368713, 0.010751644149422646], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5786861181259155, 0.010205965489149094], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5681355595588684, 0.010550525970757008], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5573152899742126, 0.010820302180945873], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5452965497970581, 0.012018737383186817], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5331947803497314, 0.012101748958230019], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5200710296630859, 0.013123788870871067], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5080083012580872, 0.012062697671353817], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4970666170120239, 0.010941681452095509], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4873277544975281, 0.009738857857882977], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4788789451122284, 0.008448814041912556], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4698355495929718, 0.009043391793966293], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46225249767303467, 0.007583058904856443], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.455206960439682, 0.007045549340546131], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44868358969688416, 0.006523370277136564], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4427776634693146, 0.005905919708311558], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4365246295928955, 0.0062530189752578735], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43198490142822266, 0.004539727699011564], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4291447699069977, 0.002840145491063595], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4280281662940979, 0.0011166040785610676], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42665162682533264, 0.0013765476178377867], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4260137677192688, 0.0006378371617756784], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42610982060432434, -9.603546641301364e-05], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4279322028160095, -0.0018223950173705816], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43051135540008545, -0.002579153049737215], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43381431698799133, -0.0033029557671397924], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43686750531196594, -0.0030531769152730703], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43958407640457153, -0.002716567600145936], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44195911288261414, -0.0023750511463731527], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44401609897613525, -0.002056971425190568], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4477478265762329, -0.003731728531420231], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4510704576969147, -0.003322632983326912], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45493826270103455, -0.003867819206789136], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46038684248924255, -0.005448565352708101], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.466317743062973, -0.005930902902036905], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.472736120223999, -0.00641838600859046], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47955310344696045, -0.006816993001848459], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4857166111469269, -0.006163501180708408], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49120810627937317, -0.0054915002547204494], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4979897439479828, -0.006781619507819414], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5059962868690491, -0.008006556890904903], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5131844878196716, -0.007188220974057913], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5204573273658752, -0.00727279856801033], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5287824273109436, -0.008325126953423023], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5360691547393799, -0.00728671345859766], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5423032641410828, -0.006234143394976854], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5474693179130554, -0.005166038870811462], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5525184869766235, -0.005049147643148899], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5563540458679199, -0.003835570067167282], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5589885115623474, -0.0026344344951212406], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5603809356689453, -0.0013924705563113093], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5605099201202393, -0.000128936895634979], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5614379048347473, -0.0009280064259655774], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.563127338886261, -0.0016894546570256352], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5635663866996765, -0.0004390247049741447], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5637374520301819, -0.00017105104052461684], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5646784901618958, -0.0009410682250745595], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5643532276153564, 0.0003252687456551939], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5647416114807129, -0.0003884013567585498], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.565868079662323, -0.001126479241065681], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.56675785779953, -0.0008897639345377684], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5663831830024719, 0.0003746908623725176], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5647314190864563, 0.0016517604235559702], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5628189444541931, 0.0019124648533761501], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5606885552406311, 0.0021303952671587467], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5573334097862244, 0.0033551419619470835], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5528077483177185, 0.004525674507021904], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5481144785881042, 0.004693279508501291], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5443128943443298, 0.003801575629040599], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5413762927055359, 0.0029366170056164265], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5383337736129761, 0.0030425270088016987], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5362496972084045, 0.0020840547513216734], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5340766310691833, 0.002173063578084111], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.532871663570404, 0.00120497215539217], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5326224565505981, 0.00024917611153796315], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5313883423805237, 0.0012341293040663004], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.529147744178772, 0.0022405870258808136], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5279299020767212, 0.0012178769102320075], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5276857614517212, 0.00024413976643700153], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5264415144920349, 0.0012442480074241757], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.524176836013794, 0.002264658221974969], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5209717750549316, 0.0032050597947090864], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5178144574165344, 0.0031573267187923193], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5157871246337891, 0.002027310198172927], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.512904167175293, 0.0028829844668507576], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5111815929412842, 0.0017225535120815039], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5095811486244202, 0.0016004572389647365], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5080946683883667, 0.001486480701714754], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5057552456855774, 0.0023394019808620214], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5036264061927795, 0.0021288844291120768], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5026800632476807, 0.0009462923044338822], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5008928179740906, 0.001787240500561893], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5002910494804382, 0.0006017701234668493], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.500945508480072, -0.0006544392090290785], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5018519759178162, -0.0009064803016372025], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5029844641685486, -0.0011324947699904442], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5043456554412842, -0.0013611391186714172], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.505925178527832, -0.001579570583999157], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5076871514320374, -0.001761933439411223], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5085905194282532, -0.0009034107788465917], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5106569528579712, -0.0020664315670728683], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5138859152793884, -0.00322896521538496], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5182275772094727, -0.004341629799455404], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.523683488368988, -0.005455898120999336], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5291428565979004, -0.005459412466734648], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5355896353721619, -0.006446735467761755], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5410110354423523, -0.00542140519246459], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5453211665153503, -0.004310137592256069], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5504854321479797, -0.005164292640984058], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5565241575241089, -0.006038706284016371], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5613296627998352, -0.004805475007742643], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5649281144142151, -0.0035984658170491457], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5682640671730042, -0.003335972549393773], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5703202486038208, -0.0020561886485666037], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5720935463905334, -0.0017732627457007766], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.573533296585083, -0.0014397974591702223], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5746445059776306, -0.0011111772619187832], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5754483938217163, -0.000803919683676213], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5759277939796448, -0.0004793908155988902], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5770617723464966, -0.0011339574120938778], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5768375396728516, 0.00022420514142140746], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5762447118759155, 0.0005928356549702585], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5742772221565247, 0.0019675088115036488], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5709463357925415, 0.0033308714628219604], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5672945380210876, 0.003651793347671628], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5643849968910217, 0.002909551141783595], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5612449645996094, 0.0031400241423398256], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5588324666023254, 0.00241253268904984], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5551992654800415, 0.0036332046147435904], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5523994565010071, 0.0027997633442282677], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5504654049873352, 0.0019340969156473875], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5493890643119812, 0.0010763007448986173], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5471386313438416, 0.002250424586236477], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5437910556793213, 0.003347601043060422], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5413433909416199, 0.0024476435501128435], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5398003458976746, 0.0015430472558364272], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5381681323051453, 0.0016322402516379952], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5374951958656311, 0.0006729100714437664], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5367227792739868, 0.0007724046008661389], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5369045734405518, -0.00018177952733822167], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5370184779167175, -0.00011388959683245048], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5360886454582214, 0.0009298506774939597], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.536106526851654, -1.7911592294694856e-05], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5370672345161438, -0.000960722507443279], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5389707088470459, -0.001903428928926587], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5417616367340088, -0.002790922299027443], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5444688200950623, -0.002707231556996703], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5480523705482483, -0.0035835467278957367], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5505238771438599, -0.0024714788887649775], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5518034100532532, -0.0012795504881069064], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5539410710334778, -0.0021376339718699455], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5549137592315674, -0.0009727295255288482], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5566703677177429, -0.0017566145397722721], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.559230625629425, -0.0025602043606340885], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5605870485305786, -0.0013564479304477572], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5626757740974426, -0.002088740933686495], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.563530683517456, -0.0008548742043785751], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5641184449195862, -0.0005877804360352457], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5654233694076538, -0.0013049028348177671], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5664921402931213, -0.0010688015026971698], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5663163065910339, 0.00017581830616109073], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5658440589904785, 0.00047229931806214154], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5651035308837891, 0.0007405286305584013], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5631201267242432, 0.0019833804108202457], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5608629584312439, 0.0022571678273379803], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5583384037017822, 0.002524539828300476], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5546316504478455, 0.003706755582243204], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5507143139839172, 0.003917335532605648], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5456293821334839, 0.005084930919110775], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5404415130615234, 0.00518787419423461], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5341306328773499, 0.006310886237770319], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5268117189407349, 0.007318909280002117], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5195292234420776, 0.0072825015522539616], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5122973322868347, 0.007231913041323423], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5052002668380737, 0.00709706312045455], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49931231141090393, 0.005887946579605341], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4946846663951874, 0.004627641756087542], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49132585525512695, 0.0033587913494557142], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48726338148117065, 0.0040624914690852165], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.483501136302948, 0.003762244712561369], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4791053831577301, 0.004395757801830769], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47606050968170166, 0.003044854151085019], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4734346866607666, 0.002625837456434965], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47025546431541443, 0.003179208841174841], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4674873650074005, 0.0027681151404976845], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4641849994659424, 0.0033023571595549583], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4613819122314453, 0.00280309421941638], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46011844277381897, 0.0012634715531021357], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46036064624786377, -0.00024222211504820734], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46111780405044556, -0.0007571389432996511], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46140629053115845, -0.00028848336660303175], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4632420837879181, -0.0018358012894168496], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46655017137527466, -0.0033080868888646364], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47131696343421936, -0.004766811151057482], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4765413999557495, -0.005224415101110935], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4821544289588928, -0.005613031331449747], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4871441125869751, -0.004989690147340298], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49142035841941833, -0.0042762355878949165], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49494025111198425, -0.0035199050325900316], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4986891448497772, -0.0037488830275833607], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5026981830596924, -0.004009040538221598], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.505939781665802, -0.0032416144385933876], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5083450675010681, -0.0024052471853792667], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5109439492225647, -0.002598899882286787], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5126900672912598, -0.001746137160807848], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5155878067016602, -0.0028977561742067337], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5186110138893127, -0.003023165510967374], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.520710825920105, -0.002099822275340557], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.522884726524353, -0.00217388104647398], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5260918736457825, -0.0032071981113404036], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5293306112289429, -0.003238696837797761], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5325558185577393, -0.003225228050723672], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5357164740562439, -0.0031606468837708235], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5388666391372681, -0.0031501937191933393], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.54290771484375, -0.004041068255901337], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5468834042549133, -0.003975661937147379], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5507588386535645, -0.0038754267152398825], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5534729957580566, -0.002714182250201702], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5560405254364014, -0.0025675336364656687], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.557366132736206, -0.0013255986850708723], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5594791173934937, -0.002113001886755228], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5623839497566223, -0.002904785331338644], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5660111904144287, -0.003627269295975566], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5703759789466858, -0.0043647619895637035], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5734575986862183, -0.0030816704966127872], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5762405395507812, -0.002782912692055106], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5796511769294739, -0.0034106578677892685], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.582705557346344, -0.003054338973015547], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5843876004219055, -0.0016820761375129223], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5846678018569946, -0.0002801776572596282], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5835671424865723, 0.0011006336426362395], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5830495953559875, 0.0005175855476409197], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5821473598480225, 0.0009022310841828585], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5808501243591309, 0.001297186827287078], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5791434645652771, 0.0017066954169422388], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5760438442230225, 0.0030995907727628946], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5736358165740967, 0.0024080355651676655], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5698796510696411, 0.0037561815697699785], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5647990107536316, 0.0050806389190256596], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.559490978717804, 0.0053080045618116856], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5529444217681885, 0.006546576973050833], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5461798906326294, 0.006764536257833242], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5402976870536804, 0.005882205907255411], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5343128442764282, 0.005984836257994175], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5292903780937195, 0.005022480618208647], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5243026614189148, 0.004987719934433699], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5183708667755127, 0.00593177555128932], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5115000605583191, 0.006870820187032223], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5057628750801086, 0.005737191066145897], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5011658072471619, 0.004597056657075882], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49775704741477966, 0.003408747026696801], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4945470988750458, 0.003209945047274232], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49157923460006714, 0.0029678710270673037], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4899277687072754, 0.0016514729941263795], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48757898807525635, 0.002348772482946515], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4855206608772278, 0.002058326965197921], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48379576206207275, 0.0017249101074412465], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4824362099170685, 0.0013595526106655598], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48240017890930176, 3.6022913263877854e-05], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4836891293525696, -0.0012889317004010081], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4862784743309021, -0.0025893638376146555], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4902329742908478, -0.003954503685235977], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49549731612205505, -0.0052643269300460815], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5000534057617188, -0.0045560672879219055], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5038300156593323, -0.00377662549726665], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5088151097297668, -0.004985127132385969], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5139462947845459, -0.005131141282618046], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5202324986457825, -0.006286219228059053], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5255771279335022, -0.005344641860574484], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5299603343009949, -0.004383209627121687], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5343713760375977, -0.004411051515489817], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5377740263938904, -0.0034026484936475754], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5420942902565002, -0.004320275504142046], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5463172793388367, -0.004222932271659374], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5514503121376038, -0.00513307424262166], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5574296712875366, -0.005979323294013739], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.563185453414917, -0.0057558161206543446], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5697234869003296, -0.006538021378219128], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5749983787536621, -0.0052749160677194595], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.580920934677124, -0.005922546610236168], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5864444375038147, -0.0055235158652067184], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5915267467498779, -0.005082300398498774], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5961740612983704, -0.004647292196750641], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6012952327728271, -0.005121173337101936], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6058599352836609, -0.004564697854220867], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6098852753639221, -0.004025329370051622], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6143296360969543, -0.004444385413080454], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6171301603317261, -0.0028005039785057306], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6203039884567261, -0.003173856297507882], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.622818648815155, -0.0025146373081952333], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6246082186698914, -0.0017895614728331566], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6267284154891968, -0.0021202347707003355], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6271567344665527, -0.0004282885347492993], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6268823146820068, 0.00027440086705610156], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6259173154830933, 0.000965039012953639], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6242260932922363, 0.0016911891289055347], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6228296756744385, 0.0013964246027171612], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6207717061042786, 0.0020579535048455], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.617029070854187, 0.003742656670510769], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6126534938812256, 0.004375589545816183], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6076633334159851, 0.0049901558086276054], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6011157631874084, 0.006547566968947649], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.594017744064331, 0.007098023314028978], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5874762535095215, 0.006541471462696791], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5815160870552063, 0.00596013804897666], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5761650204658508, 0.005351074505597353], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.569464385509491, 0.00670067872852087], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5624321103096008, 0.007032271008938551], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5561196208000183, 0.006312476471066475], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5485939979553223, 0.007525609340518713], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5409052968025208, 0.007688732817769051], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5321528315544128, 0.008752446621656418], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.524359405040741, 0.007793405558913946], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5176243782043457, 0.006735033355653286], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.511975884437561, 0.005648498423397541], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5054791569709778, 0.00649674516171217], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4981759488582611, 0.007303196936845779], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4921310544013977, 0.0060449037700891495], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4853307902812958, 0.006800264585763216], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47986289858818054, 0.0054678949527442455], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47577473521232605, 0.004088155459612608], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4721180200576782, 0.003656709101051092], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46792685985565186, 0.004191162530332804], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46519553661346436, 0.00273133278824389], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46396538615226746, 0.001230129157193005], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46218737959861755, 0.0017780105117708445], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46093058586120605, 0.0012568082893267274], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46022215485572815, 0.0007084071403369308], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46103161573410034, -0.0008094456279650331], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4613919258117676, -0.00036030367482453585], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4622304439544678, -0.0008385169203393161], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.462568074464798, -0.0003376509412191808], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46240323781967163, 0.0001648421020945534], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4637228548526764, -0.0013196062063798308], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4655725061893463, -0.0018496562261134386], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46893778443336487, -0.003365286160260439], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47273576259613037, -0.0037979669868946075], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47695282101631165, -0.004217056557536125], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48152491450309753, -0.004572105128318071], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48645085096359253, -0.004925931338220835], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4916765093803406, -0.0052256532944738865], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49615922570228577, -0.004482726566493511], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5019213557243347, -0.005762119311839342], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5068929195404053, -0.0049715423956513405], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5120121836662292, -0.005119269248098135], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5182741284370422, -0.006261969450861216], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5246479511260986, -0.006373835727572441], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5320659875869751, -0.007418038789182901], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5384863018989563, -0.006420299410820007], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5458438396453857, -0.007357535418123007], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5540586709976196, -0.008214817382395267], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5631056427955627, -0.009047009982168674], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5729207396507263, -0.009815055876970291], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5834064483642578, -0.010485731065273285], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5935152173042297, -0.010108754970133305], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6041060090065002, -0.010590816847980022], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6151033639907837, -0.010997319594025612], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.624439001083374, -0.00933567713946104], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6341108679771423, -0.009671835228800774], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6420381665229797, -0.007927315309643745], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6491546630859375, -0.007116460241377354], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6563762426376343, -0.007221629377454519], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6616709232330322, -0.005294653121381998], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6650065183639526, -0.0033355746418237686], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6673784852027893, -0.002372015966102481], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6697537899017334, -0.0023752544075250626], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6721296906471252, -0.0023759205359965563], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6724555492401123, -0.0003258723299950361], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6727141737937927, -0.000258616782957688], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6709027290344238, 0.0018114495323970914], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6690531373023987, 0.001849588705226779], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.665203869342804, 0.003849282395094633], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6613727807998657, 0.003831090172752738], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.656609833240509, 0.004762944765388966], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6519381403923035, 0.004671653266996145], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6473859548568726, 0.004552238620817661], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.641964316368103, 0.005421589128673077], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6366990804672241, 0.005265254061669111], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6306599378585815, 0.006039144471287727], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6248966455459595, 0.005763296503573656], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6174634099006653, 0.007433258928358555], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.608390212059021, 0.009073177352547646], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5997479557991028, 0.00864227581769228], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5916116237640381, 0.008136332035064697], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5839830040931702, 0.007628574036061764], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5749471783638, 0.009035815484821796], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5665423274040222, 0.008404886350035667], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5568511486053467, 0.009691174142062664], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5469629764556885, 0.009888190776109695], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5359594821929932, 0.011003448627889156], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5239342451095581, 0.012025258503854275], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5119067430496216, 0.012027484364807606], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49899622797966003, 0.012910542078316212], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48727893829345703, 0.011717271991074085], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4768848121166229, 0.010394125245511532], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46592414379119873, 0.01096067950129509], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45646971464157104, 0.009454426355659962], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4465978443622589, 0.009871866554021835], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43835994601249695, 0.008237921632826328], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43176138401031494, 0.0065985508263111115], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4249296486377716, 0.006831740960478783], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41990453004837036, 0.005025110207498074], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41471511125564575, 0.00518942903727293], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.411363422870636, 0.0033516876865178347], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40985652804374695, 0.0015068941283971071], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4092411398887634, 0.0006153829745016992], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4095379114151001, -0.00029676780104637146], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40976011753082275, -0.00022221283870749176], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.410837858915329, -0.0010777516290545464], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4118162989616394, -0.0009784447029232979], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4136268198490143, -0.0018105192575603724], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4172731637954712, -0.0036463274154812098], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42071884870529175, -0.003445683280006051], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.423986554145813, -0.003267704276368022], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4290522336959839, -0.0050656795501708984], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4338932931423187, -0.004841061774641275], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43844762444496155, -0.004554328974336386], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44370952248573303, -0.0052618905901908875], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44860291481018066, -0.004893413744866848], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45510900020599365, -0.006506086327135563], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46218493580818176, -0.00707591837272048], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.469729483127594, -0.007544543128460646], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4787139892578125, -0.00898450706154108], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4890372157096863, -0.010323245078325272], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4996487498283386, -0.010611525736749172], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5094437003135681, -0.009794957935810089], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5193766951560974, -0.009932998567819595], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5304019451141357, -0.011025226674973965], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5414050221443176, -0.011003118008375168], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5532904267311096, -0.01188535988330841], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5649632811546326, -0.011672891676425934], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5754072666168213, -0.010443956591188908], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5854645371437073, -0.010057277046144009], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.595064640045166, -0.009600082412362099], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6051974296569824, -0.01013281848281622], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6157484650611877, -0.010551031678915024], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6266798377037048, -0.010931397788226604], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6369279623031616, -0.010248111560940742], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6453897356987, -0.008461764082312584], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6540260314941406, -0.008636277168989182], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6627405881881714, -0.008714554831385612], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6704939603805542, -0.007753359153866768], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6781898736953735, -0.0076959580183029175], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6837999820709229, -0.005610120948404074], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6872962713241577, -0.0034962547942996025], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6906664967536926, -0.0033702312503010035], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6938546895980835, -0.003188185626640916], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6958429217338562, -0.0019882475025951862], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6956221461296082, 0.00022078800247982144], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6932061314582825, 0.002415986265987158], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6905931830406189, 0.0026129670441150665], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6867866516113281, 0.003806540509685874], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.680873692035675, 0.0059129828587174416], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6728257536888123, 0.008047910407185555], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.663735568523407, 0.009090213105082512], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6536523103713989, 0.010083233937621117], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6416891813278198, 0.011963138356804848], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6278913617134094, 0.013797796331346035], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6133575439453125, 0.01453380286693573], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5972093939781189, 0.016148151829838753], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5815202593803406, 0.015689173713326454], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5664403438568115, 0.015079930424690247], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5510434508323669, 0.01539685484021902], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5354695320129395, 0.015573938377201557], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5198460817337036, 0.015623425133526325], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5053066611289978, 0.014539459720253944], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48997607827186584, 0.015330573543906212], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4759466350078583, 0.014029419980943203], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4613509476184845, 0.014595690183341503], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44822463393211365, 0.013126308098435402], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43469372391700745, 0.01353093609213829], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4228416085243225, 0.011852100491523743], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41179323196411133, 0.01104838028550148], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4026416838169098, 0.009151531383395195], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3934576213359833, 0.009184062480926514], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3862406313419342, 0.007217005360871553], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.38110411167144775, 0.005136516410857439], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37802451848983765, 0.0030795906204730272], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37608978152275085, 0.0019347519846633077], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.376275897026062, -0.00018612563144415617], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37758931517601013, -0.0013134113978594542], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3799859881401062, -0.002396687865257263], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3844859004020691, -0.004499909933656454], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.391023188829422, -0.006537280511111021], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3995897173881531, -0.008566536009311676], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40809670090675354, -0.008506979793310165], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41846513748168945, -0.01036843005567789], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42965954542160034, -0.01119441818445921], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4425753355026245, -0.012915773317217827], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4551335275173187, -0.012558194808661938], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46927693486213684, -0.014143422245979309], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48284268379211426, -0.01356574147939682], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49573031067848206, -0.012887642718851566], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5098758935928345, -0.01414557360112667], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5241631269454956, -0.014287253841757774], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5374845266342163, -0.013321395963430405], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5517358183860779, -0.014251254498958588], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5668113827705383, -0.015075573697686195], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.581641435623169, -0.014830038882791996], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5951007008552551, -0.01345925871282816], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6071028709411621, -0.012002185918390751], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6195659637451172, -0.012463082559406757], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6323779821395874, -0.012812059372663498], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6444051861763, -0.012027188204228878], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6555779576301575, -0.011172753758728504], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6648167371749878, -0.009238816797733307], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6720871329307556, -0.007270357105880976], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6792952418327332, -0.007208111695945263], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.684380829334259, -0.005085593555122614], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.688309907913208, -0.00392910884693265], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.69107586145401, -0.0027659237384796143], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6936991214752197, -0.0026232816744595766], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6951149702072144, -0.0014158504782244563], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.695302426815033, -0.00018745796114671975], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.694263219833374, 0.0010392206022515893], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6920142769813538, 0.00224892795085907], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6896190643310547, 0.002395226387307048], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6871051788330078, 0.002513863379135728], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.682488739490509, 0.00461648590862751], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6767570972442627, 0.00573161942884326], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6689606308937073, 0.007796445395797491], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6611605286598206, 0.007800129242241383], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6534262299537659, 0.007734276354312897], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6457879543304443, 0.007638269569724798], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6382679343223572, 0.007520031183958054], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6309633851051331, 0.007304565981030464], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.623930037021637, 0.007033334579318762], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6172245144844055, 0.006705543026328087], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6098425388336182, 0.007381973788142204], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6008661985397339, 0.008976349607110023], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5923908352851868, 0.008475332520902157], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5834048390388489, 0.008985989727079868], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5739762783050537, 0.009428556077182293], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5631902813911438, 0.010786032304167747], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5531371831893921, 0.010053073056042194], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5418825149536133, 0.011254682205617428], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5315516591072083, 0.010330884717404842], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5212188363075256, 0.010332804173231125], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5109627842903137, 0.010256021283566952], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5008219480514526, 0.010140886530280113], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4908835291862488, 0.009938403964042664], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48025569319725037, 0.010627823881804943], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47099706530570984, 0.009258619509637356], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46317294239997864, 0.007824134081602097], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45483896136283875, 0.008333978243172169], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44808441400527954, 0.006754559930413961], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.441953182220459, 0.006131226196885109], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43747642636299133, 0.004476745147258043], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.433697909116745, 0.0037785274907946587], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42963293194770813, 0.004064956679940224], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4272706210613251, 0.0023623211309313774], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4247116148471832, 0.0025590008590370417], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42391642928123474, 0.0007952019805088639], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42389750480651855, 1.892348336696159e-05], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4236239790916443, 0.0002735196612775326], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4250958561897278, -0.0014718850143253803], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4273810088634491, -0.0022851345129311085], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4293995797634125, -0.0020185860339552164], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4311494529247284, -0.0017498572124168277], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43464288115501404, -0.0034934361465275288], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4398445785045624, -0.005201704800128937], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44670039415359497, -0.006855817511677742], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45417797565460205, -0.007477583363652229], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4632214307785034, -0.009043456986546516], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.471725195646286, -0.008503751829266548], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47962725162506104, -0.007902048528194427], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4889170825481415, -0.00928984209895134], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4995116591453552, -0.01059458963572979], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5093461275100708, -0.00983448512852192], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5193578600883484, -0.010011715814471245], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.530423104763031, -0.011065240018069744], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.542492687702179, -0.01206954661756754], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5534803867340088, -0.010987735353410244], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5652887225151062, -0.011808336712419987], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5768511295318604, -0.011562422849237919], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5870924592018127, -0.01024128869175911], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5978679060935974, -0.010775484144687653], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6070998311042786, -0.009231885895133018], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6167905926704407, -0.009690770879387856], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.626807451248169, -0.010016866959631443], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6371209025382996, -0.010313465259969234], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6466594338417053, -0.009538501501083374], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6553665399551392, -0.008707142435014248], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6621190309524536, -0.006752470042556524], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6689125299453735, -0.006793494801968336], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6756657361984253, -0.006753238849341869], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6813359260559082, -0.005670188460499048], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6868806481361389, -0.0055446932092309], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6913250684738159, -0.004444430582225323], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6956409215927124, -0.004315853584557772], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6977431178092957, -0.002102165250107646], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6996490359306335, -0.0019059664336964488], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7013376355171204, -0.0016885794466361403], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7017611861228943, -0.00042355034383945167], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7019112706184387, -0.0001500823418609798], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.701808512210846, 0.00010274641681462526], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7004781365394592, 0.0013304156018421054], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6968847513198853, 0.0035933679901063442], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6911073923110962, 0.005777379963546991], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6841573715209961, 0.006950004491955042], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6761101484298706, 0.008047192357480526], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6669698357582092, 0.009140326641499996], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6568120718002319, 0.010157771408557892], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6457481980323792, 0.01106385700404644], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6328083872795105, 0.012939805164933205], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6181405186653137, 0.014667868614196777], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6028442978858948, 0.01529624778777361], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5880255699157715, 0.014818704687058926], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5727776885032654, 0.01524790283292532], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5581898093223572, 0.01458785030990839], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5433822274208069, 0.014807596802711487], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5284692645072937, 0.014912975020706654], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5125822424888611, 0.015887022018432617], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49782806634902954, 0.014754174277186394], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4843044877052307, 0.013523568399250507], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4701324701309204, 0.014172029681503773], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45542600750923157, 0.014706451445817947], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4402877390384674, 0.015138288959860802], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4248470067977905, 0.015440710820257664], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41114455461502075, 0.013702468015253544], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3993058502674103, 0.011838693171739578], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3884105384349823, 0.010895316489040852], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37855374813079834, 0.009856787510216236], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36884352564811707, 0.009710218757390976], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3592977523803711, 0.009545775130391121], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3499380350112915, 0.009359716437757015], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.34187135100364685, 0.008066685870289803], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33516669273376465, 0.00670465687289834], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3298436999320984, 0.0053230044431984425], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3259719908237457, 0.0038716995622962713], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3235418498516083, 0.00243013771250844], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3225719630718231, 0.0009698907961137593], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3230527639389038, -0.0004808007797691971], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.32395827770233154, -0.0009055103873834014], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.32527396082878113, -0.0013156856875866652], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3270091414451599, -0.0017351702554151416], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3291476368904114, -0.0021385038271546364], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33267033100128174, -0.003522680839523673], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3385743796825409, -0.005904067773371935], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3458311855792999, -0.00725678913295269], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3554081618785858, -0.009576971642673016], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36522138118743896, -0.009813242591917515], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3762608766555786, -0.011039480566978455], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.38741809129714966, -0.01115722768008709], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3996557593345642, -0.01223766803741455], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41188845038414, -0.012232692912220955], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4260327219963074, -0.014144258573651314], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44198891520500183, -0.015956204384565353], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4595973491668701, -0.017608441412448883], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.477752685546875, -0.018155336380004883], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49728038907051086, -0.019527703523635864], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5170096158981323, -0.019729238003492355], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5378297567367554, -0.0208201315253973], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5595858693122864, -0.021756097674369812], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5811392664909363, -0.02155340276658535], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6023310422897339, -0.021191800013184547], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6240003108978271, -0.02166924625635147], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6449528336524963, -0.020952492952346802], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6660266518592834, -0.02107381820678711], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6860679388046265, -0.020041311159729958], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7039881944656372, -0.017920251935720444], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.719685971736908, -0.015697790309786797], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7330267429351807, -0.013340791687369347], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7439622282981873, -0.01093545276671648], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7523941397666931, -0.008431901223957539], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7602511644363403, -0.007857019081711769], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7654819488525391, -0.005230822134763002], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7681103944778442, -0.002628448884934187], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7681325674057007, -2.216328175563831e-05], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7655192017555237, 0.0026133849751204252], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7613048553466797, 0.0042143273167312145], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7564687132835388, 0.004836132284253836], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7490378618240356, 0.007430858910083771], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7390862107276917, 0.009951657615602016], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7266337871551514, 0.012452425435185432], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7138216495513916, 0.01281213853508234], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6996731162071228, 0.014148540794849396], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6853371858596802, 0.014335950836539268], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6698561310768127, 0.015481043606996536], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6523776650428772, 0.017478466033935547], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6330317258834839, 0.019345909357070923], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6138885021209717, 0.019143259152770042], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5951533317565918, 0.018735161051154137], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5758986473083496, 0.019254671409726143], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5562707185745239, 0.019627956673502922], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5374215841293335, 0.01884910650551319], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5185425281524658, 0.018879059702157974], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5007860064506531, 0.017756493762135506], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4832504987716675, 0.01753551885485649], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4670874774456024, 0.016163049265742302], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4523734450340271, 0.014714024029672146], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4391998052597046, 0.013173647224903107], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4266754388809204, 0.012524344958364964], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41489115357398987, 0.011784299276769161], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40391966700553894, 0.01097146887332201], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3948328495025635, 0.009086823090910912], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.38672828674316406, 0.008104554377496243], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37969696521759033, 0.0070313275791704655], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3747478127479553, 0.004949149210005999], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3699158728122711, 0.004831952042877674], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36728718876838684, 0.0026286961510777473], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3658738434314728, 0.0014133407967165112], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36662745475769043, -0.0007536247721873224], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3675842583179474, -0.000956808973569423], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3707543611526489, -0.00317010167054832], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37511080503463745, -0.004356446675956249], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37956953048706055, -0.004458716604858637], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3851472735404968, -0.005577730946242809], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.390754759311676, -0.005607502069324255], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39634525775909424, -0.005590497050434351], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.403875470161438, -0.00753021938726306], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4133603870868683, -0.009484910406172276], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4237235486507416, -0.01036315131932497], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43590405583381653, -0.012180518358945847], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44978034496307373, -0.013876267708837986], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4632178544998169, -0.013437525369226933], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4781390428543091, -0.014921179972589016], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4924006760120392, -0.014261636883020401], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.505940854549408, -0.013540154322981834], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5186970829963684, -0.012756230309605598], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5315362215042114, -0.012839156202971935], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5443320870399475, -0.012795854359865189], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5570091009140015, -0.012677016668021679], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5684456825256348, -0.011436572298407555], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5795915126800537, -0.011145880445837975], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5903295278549194, -0.010737979784607887], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6015780568122864, -0.01124853827059269], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.612273097038269, -0.010695042088627815], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6223455667495728, -0.0100724957883358], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6306979656219482, -0.008352342061698437], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6373077034950256, -0.0066097392700612545], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6420817971229553, -0.004774104338139296], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6450402140617371, -0.0029584430158138275], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6461442112922668, -0.001103964983485639], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6454048752784729, 0.0007393205305561423], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6438308954238892, 0.0015739620430395007], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6403946280479431, 0.0034363034646958113], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6351686120033264, 0.00522600719705224], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6301481127738953, 0.00502051180228591], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6243968605995178, 0.005751202814280987], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.617953896522522, 0.006442998070269823], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6108292937278748, 0.007124566473066807], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6021121144294739, 0.008717180229723454], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5918826460838318, 0.010229472070932388], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5801675915718079, 0.011715083383023739], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5690932869911194, 0.01107428316026926], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5567342638969421, 0.01235902775079012], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5451797246932983, 0.011554558761417866], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5335444808006287, 0.01163520384579897], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5208942890167236, 0.012650198303163052], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5092874765396118, 0.011606806889176369], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4987988770008087, 0.010488610714673996], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48758313059806824, 0.0112157566472888], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4766516387462616, 0.01093147974461317], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4660720229148865, 0.01057962141931057], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4559739828109741, 0.010098057799041271], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4474039077758789, 0.008570067584514618], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4404352605342865, 0.006968642119318247], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4351687431335449, 0.005266535561531782], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43160882592201233, 0.003559897653758526], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42876362800598145, 0.002845209091901779], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4256289303302765, 0.0031347074545919895], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42429855465888977, 0.001330363331362605], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42472028732299805, -0.00042174916598014534], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.425878643989563, -0.001158347469754517], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4268037974834442, -0.0009251388255506754], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4274972975254059, -0.0006935125566087663], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42897772789001465, -0.0014804401434957981], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4312284290790558, -0.00225068349391222], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43317970633506775, -0.0019512861035764217], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43683844804763794, -0.003658744040876627], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4402036666870117, -0.003365210723131895], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44418904185295105, -0.00398536492139101], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44981321692466736, -0.005624187644571066], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45702266693115234, -0.007209440227597952], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4647813141345978, -0.007758655585348606], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47298958897590637, -0.008208267390727997], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4815932512283325, -0.0086036566644907], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48954829573631287, -0.007955070585012436], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4967966079711914, -0.007248306646943092], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5052783489227295, -0.00848174374550581], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5139791965484619, -0.008700834587216377], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5237712264060974, -0.009792032651603222], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.534641444683075, -0.010870205238461494], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5464503765106201, -0.011808955110609531], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5591621398925781, -0.012711775489151478], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5716877579689026, -0.01252557523548603], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5848554968833923, -0.01316777989268303], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5975918769836426, -0.012736343778669834], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6098501086235046, -0.012258226983249187], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6205460429191589, -0.010695967823266983], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6305891275405884, -0.010043100453913212], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6389150619506836, -0.008325918577611446], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6463968753814697, -0.007481801323592663], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6530277132987976, -0.006630809977650642], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6587209701538086, -0.005693314131349325], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.662455677986145, -0.0037346642930060625], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6641841530799866, -0.0017284934874624014], action=2, reward=-1.0, info={}, terminal=False, timeout=True)]),
 EpisodeItem(id='c4c73d718baf49d69ed2f9ad8e5a4ad4', created_by='local_storage', published_in=['local_storage'], created_at='2026-04-15T10:35:26.795063', benchmark_id='ca2fd26d462d465d8ee41d64b042c2a1', metadata={'cum_eps_reward': [-999.0], 'is_expert': False}, n_tuples=999, terminated=False, timeout=True, tuples=[RLTuple(state=[-0.4620797634124756, -0.0015533595578745008], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46311667561531067, -0.0010369353694841266], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4636554718017578, -0.0005387924029491842], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4636871814727783, -3.168750845361501e-05], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4631742835044861, 0.0005128714838065207], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4641742408275604, -0.0009999375324696302], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4647003710269928, -0.0005261305486783385], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46568867564201355, -0.000988305313512683], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46715834736824036, -0.0014696900034323335], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4681317210197449, -0.0009733674232847989], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4685547351837158, -0.00042301174835301936], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4694516062736511, -0.0008968732436187565], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4707925021648407, -0.0013408743543550372], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47355666756629944, -0.0027641840279102325], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47678086161613464, -0.0032242017332464457], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4803967773914337, -0.0036158922594040632], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4833410978317261, -0.002944346284493804], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48666054010391235, -0.00331941363401711], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4913409650325775, -0.004680447280406952], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4963468909263611, -0.005005928222090006], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.500621497631073, -0.004274584352970123], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5041199922561646, -0.0034985167440027], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5077944397926331, -0.0036744100507348776], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5106481313705444, -0.0028537169564515352], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.514640748500824, -0.003992646466940641], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5197306871414185, -0.005089929327368736], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.524857223033905, -0.005126517731696367], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5300573706626892, -0.00520012341439724], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5352672934532166, -0.00520992511883378], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5393897891044617, -0.004122511949390173], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5444065928459167, -0.0050168149173259735], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5503265261650085, -0.005919900722801685], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5550972819328308, -0.004770768340677023], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5606721043586731, -0.005574864335358143], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5660036206245422, -0.005331498570740223], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5710687041282654, -0.005065055564045906], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.575859546661377, -0.00479084812104702], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5812704563140869, -0.0054109361954033375], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5863335132598877, -0.005063020624220371], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.590011715888977, -0.0036782226525247097], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5922427773475647, -0.002231091493740678], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5949831008911133, -0.0027402909472584724], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5962463617324829, -0.0012632847065106034], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5960075259208679, 0.00023884265101514757], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5962688326835632, -0.00026131432969123125], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5960057377815247, 0.00026312435511499643], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5952776670455933, 0.0007280674763023853], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5940197706222534, 0.0012578937457874417], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5933127403259277, 0.0007070290157571435], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5911433696746826, 0.002169352490454912], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5885312557220459, 0.0026121053379029036], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5854758620262146, 0.003055403009057045], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5829851627349854, 0.0024906927719712257], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5801077485084534, 0.0028774505481123924], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.577838122844696, 0.002269615652039647], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5742012858390808, 0.0036368495784699917], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5711898803710938, 0.0030113623943179846], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5688499808311462, 0.0023399244528263807], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5671938061714172, 0.001656163134612143], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5662268400192261, 0.0009669645805843174], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5639423727989197, 0.0022844653576612473], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5604087710380554, 0.003533588722348213], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5566232204437256, 0.003785578301176429], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.551607608795166, 0.005015608388930559], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5454546809196472, 0.0061529241502285], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5391408801078796, 0.0063137938268482685], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5317701697349548, 0.00737070944160223], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.525379478931427, 0.006390734575688839], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5199962258338928, 0.005383205134421587], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5146602988243103, 0.005335927940905094], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.509441614151001, 0.005218690726906061], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5043932199478149, 0.005048389546573162], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5005105137825012, 0.003882738994434476], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49582037329673767, 0.0046901400201022625], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4924129843711853, 0.0034073791466653347], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4892588257789612, 0.003154163248836994], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4853821098804474, 0.0038767135702073574], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48086974024772644, 0.004512356594204903], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4777056872844696, 0.003164049470797181], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47494763135910034, 0.002758071990683675], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4735751152038574, 0.001372512779198587], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4716646075248718, 0.0019105055835098028], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47122058272361755, 0.0004440185730345547], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47224465012550354, -0.0010240724077448249], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47370782494544983, -0.00146315002348274], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4755919575691223, -0.0018841472920030355], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47692349553108215, -0.0013315515825524926], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4796389937400818, -0.002715469803661108], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4837602376937866, -0.004121263977140188], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48925310373306274, -0.00549285439774394], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49404317140579224, -0.004790080711245537], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49811434745788574, -0.004071182571351528], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5024473071098328, -0.004332936368882656], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.506969153881073, -0.00452187517657876], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5116336345672607, -0.004664445295929909], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5174476504325867, -0.005814045667648315], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5243600606918335, -0.006912407465279102], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5313421487808228, -0.006982103455811739], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5393414497375488, -0.007999276742339134], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5472245216369629, -0.007883044891059399], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5539529323577881, -0.006728441454470158], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5604893565177917, -0.006536389701068401], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5657520890235901, -0.005262777209281921], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5717750787734985, -0.006022996734827757], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5774879455566406, -0.0057128663174808025], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.583870530128479, -0.006382535211741924], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.588874340057373, -0.005003852769732475], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5924260020256042, -0.0035516477655619383], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5945032835006714, -0.00207728473469615], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5971071124076843, -0.0026038025971502066], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6002007722854614, -0.003093663603067398], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6037359833717346, -0.003535239025950432], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6077535152435303, -0.004017519764602184], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.610228419303894, -0.0024748824071139097], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6110916137695312, -0.0008632463868707418], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6113694906234741, -0.0002778734778985381], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6110629439353943, 0.0003065424971282482], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6111790537834167, -0.00011605852341745049], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6097182631492615, 0.0014607880730181932], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6066911816596985, 0.0030270610004663467], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6020748019218445, 0.0046163639053702354], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5969110727310181, 0.005163747351616621], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5912734866142273, 0.0056375740095973015], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5841622352600098, 0.007111242972314358], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5756305456161499, 0.008531737141311169], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5657809972763062, 0.009849541820585728], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5556802153587341, 0.010100752115249634], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5443792343139648, 0.011301005259156227], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5320006608963013, 0.012378562241792679], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5206412672996521, 0.01135940756648779], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5103907585144043, 0.01025049202144146], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5002973675727844, 0.010093367658555508], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4894266426563263, 0.010870737954974174], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47788745164871216, 0.011539188213646412], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46772903203964233, 0.010158422403037548], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45801135897636414, 0.009717668406665325], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44982481002807617, 0.008186559192836285], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44326338171958923, 0.006561423186212778], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4383825659751892, 0.004880816675722599], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43514204025268555, 0.0032405382953584194], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4316386878490448, 0.003503341693431139], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42781737446784973, 0.0038213245570659637], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42576864361763, 0.002048719208687544], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4244438409805298, 0.0013248197501525283], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.423903226852417, 0.0005405936390161514], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4241439402103424, -0.00024069903884083033], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42513179779052734, -0.0009878630517050624], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42785099148750305, -0.0027192039415240288], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4303564429283142, -0.002505444223061204], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4336375296115875, -0.0032810880802571774], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.438615620136261, -0.004978087730705738], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44431251287460327, -0.005696894600987434], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.450639009475708, -0.006326483562588692], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4585273265838623, -0.00788831152021885], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4659695029258728, -0.007442180532962084], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4729262888431549, -0.006956805475056171], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48129528760910034, -0.008368981070816517], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48901286721229553, -0.007717592641711235], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49799850583076477, -0.008985637687146664], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5082330703735352, -0.010234583169221878], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5196212530136108, -0.011388181708753109], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5300408601760864, -0.010419604368507862], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5414646863937378, -0.01142383087426424], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5527731776237488, -0.011308453977108002], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5629401803016663, -0.010167027823626995], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5718720555305481, -0.008931845426559448], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5794469714164734, -0.007574940100312233], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5856143832206726, -0.006167425774037838], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5913693904876709, -0.0057550170458853245], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5956305265426636, -0.004261126276105642], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5983697175979614, -0.0027391863986849785], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6015850305557251, -0.00321531156077981], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6032876968383789, -0.0017026611603796482], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6044674515724182, -0.0011797742918133736], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6050931811332703, -0.0006257205968722701], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6051626205444336, -6.940473394934088e-05], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6046956777572632, 0.0004669383342843503], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6027078032493591, 0.0019878901075571775], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5991933345794678, 0.0035144281573593616], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5951886177062988, 0.004004735965281725], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5907052755355835, 0.004483333323150873], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5847997665405273, 0.005905528552830219], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5784826874732971, 0.006317084655165672], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5727648138999939, 0.005717862397432327], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5667204856872559, 0.006044305860996246], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5603989362716675, 0.0063215866684913635], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5548335313796997, 0.0055653927847743034], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5500695109367371, 0.004763992503285408], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5441445112228394, 0.0059249852783977985], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.537094235420227, 0.0070502799935638905], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5310179591178894, 0.006076272577047348], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.52593994140625, 0.0050780451856553555], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5199065804481506, 0.006033382844179869], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5139585733413696, 0.0059479898773133755], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5091069340705872, 0.004851627629250288], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5034359097480774, 0.005671034567058086], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49796468019485474, 0.0054712132550776005], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4937131404876709, 0.004251543898135424], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4897054433822632, 0.004007700830698013], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4850206971168518, 0.004684746265411377], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4816705584526062, 0.003350131446495652], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47863689064979553, 0.003033681306988001], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47697383165359497, 0.0016630615573376417], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47573012113571167, 0.0012437158729881048], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4739108979701996, 0.0018192214192822576], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4715195894241333, 0.0023913164623081684], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4706002175807953, 0.0009193597361445427], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4701283276081085, 0.0004718777199741453], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4710913598537445, -0.0009630135027691722], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47345075011253357, -0.0023594049271196127], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4761889576911926, -0.0027381908148527145], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4793001711368561, -0.0031112227588891983], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48281365633010864, -0.0035134984645992517], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48569074273109436, -0.0028770864009857178], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4899305999279022, -0.004239832982420921], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49543511867523193, -0.005504521541297436], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5002095103263855, -0.0047743795439600945], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.505210280418396, -0.0050007994286715984], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5094142556190491, -0.004203954711556435], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5127448439598083, -0.003330587176606059], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5151803493499756, -0.0024355307687073946], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5187106132507324, -0.0035302808973938227], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5212984085083008, -0.002587751019746065], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5249152183532715, -0.003616807283833623], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5295274257659912, -0.004612228833138943], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5351417660713196, -0.005614311899989843], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5407023429870605, -0.005560589488595724], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5472149848937988, -0.006512636784464121], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.554569661617279, -0.0073547218926250935], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.561725378036499, -0.007155676372349262], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5696666240692139, -0.007941252551972866], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5763413906097412, -0.00667474465444684], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5816472172737122, -0.00530586251989007], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5875250101089478, -0.005877808202058077], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5929906368255615, -0.005465621594339609], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5969498753547668, -0.003959193825721741], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6004188656806946, -0.003469032235443592], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6033341288566589, -0.0029152201022952795], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.604672908782959, -0.001338787260465324], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6054509878158569, -0.0007781008607707918], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6046656966209412, 0.0007852792041376233], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6033093333244324, 0.0013563488610088825], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6004076600074768, 0.002901701023802161], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5979411005973816, 0.002466543111950159], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.593949019908905, 0.003992094658315182], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5884336829185486, 0.0055153691209852695], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5814746618270874, 0.006958963815122843], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5731542110443115, 0.008320506662130356], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5645243525505066, 0.008629810996353626], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5565913915634155, 0.007932956330478191], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5474555492401123, 0.009135870262980461], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5381640195846558, 0.009291518479585648], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5297752618789673, 0.008388788439333439], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.522416889667511, 0.007358354516327381], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5161131024360657, 0.006303796544671059], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5109120607376099, 0.0052010235376656055], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.504838228225708, 0.00607384042814374], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4979778230190277, 0.006860385183244944], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49135205149650574, 0.006625796668231487], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4849870800971985, 0.006364947650581598], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4789591133594513, 0.0060279713943600655], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4743484854698181, 0.004610637668520212], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4711673855781555, 0.0031810912769287825], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4674232602119446, 0.003744141198694706], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46514812111854553, 0.002275136997923255], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4643889367580414, 0.0007591701578348875], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4651273190975189, -0.0007383712218143046], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46638837456703186, -0.0012610669946298003], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4681301414966583, -0.0017417578492313623], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46936485171318054, -0.0012346983421593904], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4720255732536316, -0.0026607345789670944], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47609999775886536, -0.004074424505233765], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47960084676742554, -0.0035008369013667107], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4844367504119873, -0.004835901781916618], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4906136393547058, -0.00617691595107317], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4970634877681732, -0.006449843291193247], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5027965307235718, -0.005733045283704996], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5077382922172546, -0.004941771272569895], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5118641257286072, -0.004125795792788267], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5150841474533081, -0.0032200647983700037], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5183988809585571, -0.0033146825153380632], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5217534303665161, -0.0033545943442732096], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.526185929775238, -0.004432457499206066], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5316268801689148, -0.005441001150757074], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.538093626499176, -0.006466742139309645], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5455363392829895, -0.007442703004926443], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.552893340587616, -0.007357007823884487], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.559059202671051, -0.006165867205709219], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5649682879447937, -0.005909034051001072], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5715976357460022, -0.006629372481256723], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5779420733451843, -0.006344416178762913], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5829121470451355, -0.004970096983015537], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5864418745040894, -0.0035297481808811426], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5895176529884338, -0.003075747285038233], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5921794772148132, -0.002661819336935878], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.594408392906189, -0.002228911267593503], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5971697568893433, -0.0027613877318799496], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5994538068771362, -0.0022840453311800957], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6012437343597412, -0.001789923757314682], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6025035977363586, -0.0012598553439602256], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.60219806432724, 0.0003055022389162332], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6003561615943909, 0.0018419049447402358], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5990056395530701, 0.0013505668612197042], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5981320738792419, 0.0008735337178222835], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5957745909690857, 0.002357504330575466], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5918839573860168, 0.003890636609867215], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5865125060081482, 0.005371415987610817], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5807346105575562, 0.005777925252914429], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5745490193367004, 0.006185558624565601], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5669994950294495, 0.007549549452960491], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5592004060745239, 0.0077990698628127575], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5511481165885925, 0.008052289485931396], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5439652800559998, 0.007182827219367027], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5356771349906921, 0.00828818790614605], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.526353657245636, 0.009323452599346638], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5170776844024658, 0.009275969117879868], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5068805813789368, 0.010197118856012821], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4978550970554352, 0.009025479666888714], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49010005593299866, 0.0077550411224365234], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48366186022758484, 0.006438203155994415], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47657039761543274, 0.00709144352003932], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4688740670681, 0.007696330547332764], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46061715483665466, 0.008256909437477589], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4518496096134186, 0.008767555467784405], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44263237714767456, 0.009217227809131145], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43505188822746277, 0.007580501027405262], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42820191383361816, 0.006849963217973709], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4231380820274353, 0.005063826683908701], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4178263247013092, 0.005311776418238878], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4143410325050354, 0.0034852810204029083], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.410736083984375, 0.0036049429327249527], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4070492088794708, 0.0036868813913315535], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4052824079990387, 0.0017668086802586913], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4034050405025482, 0.0018773592310026288], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40249505639076233, 0.0009099919116124511], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40152865648269653, 0.0009663997334428132], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40149784088134766, 3.079755697399378e-05], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.402397096157074, -0.0008992458460852504], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40325894951820374, -0.000861835025716573], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40600886940956116, -0.002749936655163765], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40967878699302673, -0.0036699010524898767], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41522055864334106, -0.00554179260507226], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42162153124809265, -0.006400957237929106], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4278194010257721, -0.00619786512106657], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4337644875049591, -0.005945099052041769], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44042137265205383, -0.006656872574239969], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44669848680496216, -0.00627713929861784], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45360252261161804, -0.006904012057930231], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46006205677986145, -0.006459536496549845], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4680699408054352, -0.008007894270122051], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4755026400089264, -0.007432688493281603], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4843672215938568, -0.008864578790962696], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4925902187824249, -0.008223014883697033], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5020645260810852, -0.009474321268498898], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5107531547546387, -0.008688598871231079], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5206028819084167, -0.009849727153778076], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5304844379425049, -0.009881554171442986], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5403740406036377, -0.009889612905681133], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5511700510978699, -0.010795986279845238], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.560779869556427, -0.009609874337911606], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5711884498596191, -0.010408579371869564], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5803173184394836, -0.009128833189606667], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5890456438064575, -0.008728311397135258], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5963332653045654, -0.007287653163075447], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6031460165977478, -0.006812721025198698], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.608389139175415, -0.005243120715022087], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6120142936706543, -0.0036251996643841267], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6140592694282532, -0.0020449701696634293], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.616455078125, -0.0023958024103194475], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6191974878311157, -0.0027423882856965065], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6212918758392334, -0.002094410825520754], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.622667133808136, -0.001375243067741394], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6243771910667419, -0.0017100849654525518], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6253997087478638, -0.0010224975412711501], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6266892552375793, -0.0012895463733002543], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6262644529342651, 0.0004248041659593582], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6251477003097534, 0.0011167596094310284], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6233533620834351, 0.001794312964193523], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.621870219707489, 0.0014831381849944592], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6197110414505005, 0.002159183146432042], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6168670058250427, 0.002844033529981971], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6123937964439392, 0.0044732061214745045], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6072793006896973, 0.005114544648677111], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6006025671958923, 0.006676734425127506], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5944199562072754, 0.006182569079101086], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5877220630645752, 0.006697915494441986], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5795899033546448, 0.008132151328027248], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5720922350883484, 0.007497698999941349], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5643031597137451, 0.0077890390530228615], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5572402477264404, 0.007062949705868959], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.54996657371521, 0.007273650262504816], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5435577034950256, 0.006408871151506901], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5380790829658508, 0.005478589795529842], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5315131545066833, 0.006565964315086603], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5259196162223816, 0.005593517795205116], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5203738212585449, 0.005545794032514095], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5148882269859314, 0.005485619883984327], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5085333585739136, 0.006354848388582468], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5023049712181091, 0.006228374317288399], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4962388873100281, 0.006066089496016502], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4903920590877533, 0.005846826825290918], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4858677089214325, 0.0045243497006595135], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.482702374458313, 0.003165356582030654], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4798746109008789, 0.0028277563396841288], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47645434737205505, 0.003420249791815877], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4744398593902588, 0.002014514524489641], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4718639552593231, 0.0025758875999599695], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4697414040565491, 0.00212254305370152], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4680613577365875, 0.0016800708835944533], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4678497910499573, 0.00021155639842618257], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4680832028388977, -0.00023342019994743168], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46978288888931274, -0.0016996670747175813], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4719149172306061, -0.002132028341293335], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4744356870651245, -0.00252078240737319], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47840702533721924, -0.003971352241933346], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4837218225002289, -0.005314779933542013], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4904169738292694, -0.0066951666958630085], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4974372982978821, -0.007020309567451477], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5046971440315247, -0.007259872276335955], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5111780762672424, -0.006480897776782513], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5187531113624573, -0.0075750527903437614], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5264056921005249, -0.007652556989341974], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.534098744392395, -0.007693089544773102], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5427260398864746, -0.008627290837466717], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5502654910087585, -0.007539437618106604], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5586725473403931, -0.008407083339989185], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5658915638923645, -0.007219002116471529], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5728152990341187, -0.0069237155839800835], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5784335732460022, -0.005618303548544645], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5837113857269287, -0.005277805961668491], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5895923376083374, -0.0058809188194572926], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5960559248924255, -0.00646359845995903], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6030104756355286, -0.006954575423151255], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.608433187007904, -0.005422693677246571], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6132676601409912, -0.004834446590393782], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6184550523757935, -0.005187393166124821], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6239617466926575, -0.0055067227222025394], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6288076639175415, -0.004845907911658287], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6339023113250732, -0.005094646010547876], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6382052898406982, -0.0043029929511249065], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.641742467880249, -0.003537154057994485], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6444404125213623, -0.002697960939258337], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6462535262107849, -0.0018131077522411942], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6472154855728149, -0.0009619859047234058], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6462924480438232, 0.0009230475989170372], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6455094814300537, 0.0007830053800716996], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6429040431976318, 0.0026054177433252335], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6404794454574585, 0.0024246207904070616], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6362161040306091, 0.004263297654688358], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6311637759208679, 0.005052371881902218], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.626324474811554, 0.0048392475582659245], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6217455863952637, 0.0045789023861289024], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6154459714889526, 0.006299618631601334], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6074888706207275, 0.007957096211612225], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5999805927276611, 0.007508289068937302], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5909517407417297, 0.009028824977576733], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5804408192634583, 0.010510969907045364], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5685513615608215, 0.011889411136507988], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5563901662826538, 0.012161238119006157], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5439983010292053, 0.012391828931868076], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5304697751998901, 0.013528550043702126], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5158914923667908, 0.014578301459550858], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5014554262161255, 0.014436060562729836], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48724493384361267, 0.01421047281473875], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47438815236091614, 0.012856771238148212], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4629438519477844, 0.011444312520325184], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4520135819911957, 0.010930274613201618], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4406523108482361, 0.011361273005604744], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4289926290512085, 0.011659683659672737], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4170677661895752, 0.011924856342375278], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40498480200767517, 0.012082959525287151], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39485326409339905, 0.0101315313950181], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.38574114441871643, 0.00911213830113411], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3766595125198364, 0.009081636555492878], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.369689404964447, 0.006970102898776531], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36292076110839844, 0.006768625695258379], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.35737136006355286, 0.005549421533942223], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.35405048727989197, 0.0033208648674190044], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3520340919494629, 0.002016401616856456], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3513394594192505, 0.0006946252542547882], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.35291537642478943, -0.0015759352827444673], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3567485213279724, -0.0038331435061991215], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36281222105026245, -0.006063672713935375], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37111544609069824, -0.008303239941596985], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3815385103225708, -0.010423053987324238], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39401867985725403, -0.01248016394674778], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40852653980255127, -0.014507866464555264], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4229464530944824, -0.014419922605156898], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4381413459777832, -0.01519489660859108], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45504310727119446, -0.016901744529604912], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4715171456336975, -0.01647404581308365], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48738911747932434, -0.015871990472078323], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5026156902313232, -0.015226575545966625], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5180744528770447, -0.015458764508366585], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5346407294273376, -0.01656624674797058], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5502002835273743, -0.015559577383100986], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5655913352966309, -0.015391037799417973], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.580707848072052, -0.015116524882614613], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5954720377922058, -0.014764164574444294], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6087357401847839, -0.013263698667287827], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6224268078804016, -0.013691089116036892], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6344326138496399, -0.012005803175270557], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6446875333786011, -0.010254932567477226], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6550790667533875, -0.01039151195436716], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6655871272087097, -0.010508077219128609], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6761165261268616, -0.010529402643442154], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6855800747871399, -0.009463531896471977], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6949157118797302, -0.009335625916719437], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7020325660705566, -0.007116891909390688], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7088835835456848, -0.006851015146821737], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7134426236152649, -0.004559000488370657], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7166749238967896, -0.003232320537790656], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7175968885421753, -0.0009219487546943128], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7161648869514465, 0.0014320043846964836], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7144234776496887, 0.0017414102330803871], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7123640179634094, 0.0020594156812876463], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7080194354057312, 0.0043446095660328865], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7034112811088562, 0.004608160350471735], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6975778341293335, 0.005833426956087351], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6905058026313782, 0.007072017528116703], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6832737326622009, 0.0072320797480642796], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6749354004859924, 0.008338326588273048], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6665656566619873, 0.008369768969714642], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6562276482582092, 0.01033798884600401], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6439602375030518, 0.0122674023732543], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6308681964874268, 0.01309208944439888], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6179919242858887, 0.01287624891847372], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6054747104644775, 0.012517208233475685], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5924240946769714, 0.013050619512796402], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5779029726982117, 0.01452112477272749], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5630552768707275, 0.014847713522613049], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5469259023666382, 0.016129380092024803], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5316612720489502, 0.015264620073139668], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5153769254684448, 0.016284311190247536], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4991815686225891, 0.016195377334952354], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4842510223388672, 0.014930551871657372], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4696967303752899, 0.014554290100932121], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4555695056915283, 0.014127235859632492], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44098836183547974, 0.014581117779016495], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4270269572734833, 0.01396140456199646], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41281992197036743, 0.014207040891051292], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40045100450515747, 0.012368935160338879], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3880036175251007, 0.01244736835360527], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3776231110095978, 0.010380509309470654], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36735254526138306, 0.010270562022924423], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3582353889942169, 0.009117155335843563], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3493681848049164, 0.008867214433848858], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3408268094062805, 0.008541373535990715], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33464527130126953, 0.006181542761623859], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3298782706260681, 0.004767003934830427], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3255375921726227, 0.004340675659477711], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3226231038570404, 0.002914493205025792], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3201935291290283, 0.002429576823487878], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3182680010795593, 0.0019255157094448805], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.31783127784729004, 0.00043674063635990024], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.31792035698890686, -8.909193275030702e-05], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3194589614868164, -0.001538618584163487], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.32143986225128174, -0.0019808781798928976], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.325864315032959, -0.0044244700111448765], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33178091049194336, -0.005916579160839319], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33813926577568054, -0.006358354818075895], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.34488722681999207, -0.006747981999069452], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3539414703845978, -0.009054237976670265], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36426451802253723, -0.01032303087413311], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3747933506965637, -0.01052883267402649], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.38747894763946533, -0.012685599736869335], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40021443367004395, -0.01273548137396574], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4139271676540375, -0.01371276006102562], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4274798035621643, -0.013552634976804256], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4407578706741333, -0.01327804010361433], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45366397500038147, -0.012906118296086788], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46612638235092163, -0.012462400831282139], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48008251190185547, -0.01395613607019186], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49541884660720825, -0.015336339361965656], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5110468864440918, -0.015628015622496605], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5278379321098328, -0.016791081055998802], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.545660674571991, -0.017822707071900368], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5633586049079895, -0.017697935923933983], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5817859768867493, -0.018427401781082153], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6008396744728088, -0.019053664058446884], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6193856596946716, -0.018545987084507942], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.637301504611969, -0.017915872856974602], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6544350981712341, -0.01713356003165245], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6696704030036926, -0.015235304832458496], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6829133629798889, -0.013242983259260654], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6950198411941528, -0.012106481939554214], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7059363126754761, -0.010916488245129585], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7155656218528748, -0.009629311040043831], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7228646874427795, -0.007299051620066166], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.727802574634552, -0.004937894642353058], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7323552966117859, -0.004552681930363178], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7365039587020874, -0.004148663487285376], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7391624450683594, -0.002658500336110592], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7393393516540527, -0.00017694027337711304], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7370163798332214, 0.0023229853250086308], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7322428822517395, 0.004773502238094807], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.726075291633606, 0.0061676232144236565], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7195442318916321, 0.006531034130603075], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7116959095001221, 0.00784832239151001], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7035622596740723, 0.008133679628372192], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6951493620872498, 0.008412868715822697], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.685579240322113, 0.00957013014703989], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6759072542190552, 0.009671996347606182], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6661924719810486, 0.009714777581393719], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6544710993766785, 0.01172136515378952], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6428095102310181, 0.011661573313176632], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6303207278251648, 0.012488802894949913], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6171078085899353, 0.01321292296051979], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6042796969413757, 0.012828096747398376], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5898687839508057, 0.014410939998924732], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5740284323692322, 0.0158403180539608], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5588198900222778, 0.015208548866212368], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5443716645240784, 0.014448233880102634], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5297821164131165, 0.014589549042284489], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.516205370426178, 0.013576729223132133], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5017431974411011, 0.014462168328464031], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4885178804397583, 0.013225332833826542], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4756024479866028, 0.012915444560348988], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4631201922893524, 0.012482237070798874], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4511267840862274, 0.011993417516350746], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4397275149822235, 0.011399279348552227], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4290108382701874, 0.010716652497649193], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4200049042701721, 0.009005934000015259], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4128021001815796, 0.007202820852398872], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40741825103759766, 0.005383845418691635], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40391895174980164, 0.0034993106964975595], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4013022184371948, 0.0026167226023972034], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39861592650413513, 0.002686290303245187], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3979094922542572, 0.0007064350065775216], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39715445041656494, 0.0007550478912889957], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3983420729637146, -0.0011876259231939912], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40145790576934814, -0.0031158505007624626], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4045571982860565, -0.0030992808751761913], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4085308015346527, -0.003973587416112423], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4143548905849457, -0.0058241174556314945], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4220552146434784, -0.007700301706790924], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4315321147441864, -0.009476897306740284], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4407605528831482, -0.009228438138961792], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45069050788879395, -0.009929952211678028], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4602159261703491, -0.009525422006845474], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4692898392677307, -0.009073925204575062], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47880491614341736, -0.009515080600976944], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48771384358406067, -0.008908935822546482], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4969015419483185, -0.009187698364257812], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5073259472846985, -0.010424376465380192], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5179196000099182, -0.010593641549348831], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5296153426170349, -0.011695751920342445], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5402873754501343, -0.010672078467905521], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5498591065406799, -0.009571706876158714], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5602630376815796, -0.010403917171061039], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5694349408149719, -0.009171937592327595], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5782870650291443, -0.008852123282849789], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5877386331558228, -0.009451562538743019], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5957865715026855, -0.008047904819250107], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6042987108230591, -0.008512159809470177], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6122534871101379, -0.007954775355756283], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.619579553604126, -0.007326056715101004], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6252683997154236, -0.005688855424523354], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6302394866943359, -0.004971105605363846], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6354578137397766, -0.00521830515936017], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6399027705192566, -0.004444954916834831], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6444994211196899, -0.004596649669110775], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6492487192153931, -0.0047493199817836285], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6530811786651611, -0.0038324110209941864], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6549932956695557, -0.0019121168879792094], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.655981719493866, -0.000988428364507854], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6570541262626648, -0.0010724200401455164], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6582170724868774, -0.0011629696236923337], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.658441960811615, -0.00022485421504825354], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.656751811504364, 0.0016900997143238783], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6530940532684326, 0.0036577729042619467], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6495200395584106, 0.003574010916054249], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6460414528846741, 0.003478623228147626], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.640717089176178, 0.005324335768818855], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6345970034599304, 0.006120077334344387], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6287298202514648, 0.005867178551852703], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6231008768081665, 0.005628987681120634], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6158061027526855, 0.00729477871209383], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6078464984893799, 0.007959546521306038], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5983390212059021, 0.009507495909929276], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5883203744888306, 0.010018663480877876], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5778712034225464, 0.010449165478348732], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5670374035835266, 0.01083378680050373], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5569324493408203, 0.010104939341545105], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5455999970436096, 0.011332503519952297], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5331589579582214, 0.012441040948033333], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5206758379936218, 0.012483114376664162], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.509226381778717, 0.011449458077549934], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4989328980445862, 0.01029344741255045], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48784950375556946, 0.011083412915468216], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47807347774505615, 0.009776011109352112], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4697123169898987, 0.008361155167222023], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46177026629447937, 0.007942062802612782], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45530644059181213, 0.006463828030973673], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4494014382362366, 0.0059050158597528934], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4430883228778839, 0.0063130916096270084], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43841326236724854, 0.004675072617828846], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43340596556663513, 0.0050073047168552876], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4281042814254761, 0.0053016808815300465], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4245927929878235, 0.003511488903313875], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42288094758987427, 0.0017118266550824046], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4209957718849182, 0.0018851953791454434], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42095762491226196, 3.814747105934657e-05], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42174312472343445, -0.0007855045259930193], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4232994616031647, -0.0015563315246254206], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4266331195831299, -0.003333666594699025], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.429718554019928, -0.0030854311771690845], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43456727266311646, -0.004848729819059372], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44109317660331726, -0.0065258825197815895], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44723525643348694, -0.006142101250588894], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45294877886772156, -0.005713511258363724], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45921438932418823, -0.006265608128160238], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46503114700317383, -0.0058167497627437115], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.47130078077316284, -0.006269639823585749], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4769633114337921, -0.005662544164806604], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4839801490306854, -0.00701683945953846], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49131080508232117, -0.007330651395022869], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49895626306533813, -0.007645446341484785], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5068724751472473, -0.0079162223264575], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5159990191459656, -0.009126531891524792], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5262182950973511, -0.01021930668503046], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5374666452407837, -0.011248317547142506], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5496682524681091, -0.012201611883938313], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5627485513687134, -0.013080325908958912], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5755841732025146, -0.012835610657930374], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5890464782714844, -0.013462289236485958], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6020817160606384, -0.01303526945412159], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6155977249145508, -0.013515980914235115], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6275017857551575, -0.011904091574251652], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6386556625366211, -0.011153820902109146], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.648007869720459, -0.009352258406579494], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6565214395523071, -0.008513531647622585], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6630657911300659, -0.006544378120452166], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6695978045463562, -0.006532006431370974], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6760871410369873, -0.006489353720098734], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6824992895126343, -0.006412100978195667], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6877821683883667, -0.005282883998006582], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6909377574920654, -0.003155582118779421], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6939419507980347, -0.0030041993595659733], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6947757005691528, -0.0008337512263096869], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6933992505073547, 0.001376432366669178], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6918315291404724, 0.0015676982002332807], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.690112292766571, 0.0017192679224535823], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6862346529960632, 0.0038776532746851444], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6822211742401123, 0.004013477358967066], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6780965924263, 0.004124533850699663], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6739128232002258, 0.004183819983154535], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6686930060386658, 0.005219768267124891], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6614609956741333, 0.007232058327645063], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6522759795188904, 0.009184998460114002], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6421912312507629, 0.010084722191095352], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6302420496940613, 0.011949188075959682], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.61858731508255, 0.011654771864414215], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6052591800689697, 0.013328097760677338], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5923606157302856, 0.012898575514554977], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5789579153060913, 0.013402681797742844], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5661482214927673, 0.01280973944813013], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5520991086959839, 0.014049078337848186], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5368569493293762, 0.015242177061736584], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.52054762840271, 0.01630931720137596], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.504317581653595, 0.016230031847953796], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4882793724536896, 0.016038214787840843], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.471537321805954, 0.016742050647735596], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4562336802482605, 0.015303656458854675], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44148075580596924, 0.014752926304936409], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4263475239276886, 0.015133209526538849], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41094446182250977, 0.015403076075017452], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39643123745918274, 0.014513232745230198], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.38289591670036316, 0.013535317033529282], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37044757604599, 0.012448337860405445], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3601830005645752, 0.010264559648931026], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.35211896896362305, 0.008064037188887596], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.34434542059898376, 0.0077735427767038345], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33791816234588623, 0.006427264306694269], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3338618576526642, 0.004056314937770367], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33122673630714417, 0.002635112963616848], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3290420174598694, 0.0021847113966941833], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.32924360036849976, -0.00020156607206445187], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3318317234516144, -0.002588131697848439], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33687058091163635, -0.005038846284151077], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.34225624799728394, -0.005385673139244318], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.34900376200675964, -0.0067475116811692715], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3580719232559204, -0.009068163111805916], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3674141764640808, -0.009342269971966743], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.37789493799209595, -0.010480733588337898], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.38849321007728577, -0.010598300956189632], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.40009546279907227, -0.011602247133851051], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.411624550819397, -0.011529076844453812], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4239940941333771, -0.012369546107947826], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43818914890289307, -0.014195054769515991], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4531082510948181, -0.014919092878699303], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4675756096839905, -0.01446738000959158], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4815209209918976, -0.013945311307907104], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4958301782608032, -0.014309239573776722], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5113540291786194, -0.015523867681622505], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5279697775840759, -0.016615718603134155], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5445771813392639, -0.016607416793704033], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5610790252685547, -0.016501856967806816], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5763028860092163, -0.015223845839500427], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5921963453292847, -0.015893496572971344], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6076129078865051, -0.015416532754898071], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6234706044197083, -0.015857676044106483], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6386020183563232, -0.015131460502743721], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.652913510799408, -0.01431148312985897], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6673461198806763, -0.014432597905397415], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.680753231048584, -0.013407078571617603], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6930984854698181, -0.012345261871814728], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7032938003540039, -0.01019531860947609], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7122671008110046, -0.00897333212196827], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7199462056159973, -0.007679065223783255], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7262609601020813, -0.006314805708825588], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7301823496818542, -0.003921337891370058], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7336889505386353, -0.003506607376039028], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7347758412361145, -0.0010869352845475078], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7334256172180176, 0.0013502775691449642], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7316352128982544, 0.0017903681145980954], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7274322509765625, 0.00420294888317585], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7228496074676514, 0.004582647699862719], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7159104943275452, 0.006939156912267208], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7086737751960754, 0.0072367205284535885], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7001261711120605, 0.008547564037144184], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6893459558486938, 0.010780231095850468], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6764005422592163, 0.012945408001542091], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6613928079605103, 0.015007734298706055], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6444595456123352, 0.016933247447013855], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6266781687736511, 0.01778138056397438], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6071386337280273, 0.01953953504562378], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5870216488838196, 0.02011696994304657], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5664898753166199, 0.020531808957457542], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5456752181053162, 0.020814642310142517], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5247661471366882, 0.020909080281853676], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5048891305923462, 0.01987701654434204], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48619693517684937, 0.018692191690206528], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.46681517362594604, 0.019381750375032425], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44886642694473267, 0.01794877089560032], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.43247419595718384, 0.016392206773161888], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.41577446460723877, 0.01669975183904171], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39888182282447815, 0.016892625018954277], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3819098472595215, 0.016971994191408157], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3669947683811188, 0.014915072359144688], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3542463183403015, 0.012748434208333492], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3417631685733795, 0.012483151629567146], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3306049704551697, 0.011158197186887264], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3198983669281006, 0.010706625878810883], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3106755018234253, 0.009222856722772121], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.30303290486335754, 0.007642601616680622], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.29601630568504333, 0.007016605231910944], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.28967025876045227, 0.0063460408709943295], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.28403350710868835, 0.005636733025312424], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.28109192848205566, 0.0029415965545922518], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.27988773584365845, 0.0012041914742439985], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.2813848555088043, -0.0014971329364925623], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.28554767370224, -0.004162797704339027], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.2903936803340912, -0.004846025258302689], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.2978971302509308, -0.0075034331530332565], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3060421347618103, -0.008145001716911793], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.315782755613327, -0.009740634821355343], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.32605504989624023, -0.010272285901010036], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.33875277638435364, -0.012697742320597172], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.35378047823905945, -0.015027700923383236], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3690316081047058, -0.015251122415065765], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3844608962535858, -0.015429284423589706], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4019831717014313, -0.017522292211651802], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4204345643520355, -0.018451375886797905], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4397260546684265, -0.019291499629616737], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45867860317230225, -0.018952535465359688], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4781266152858734, -0.019448010250926018], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.49891865253448486, -0.020792046561837196], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5189646482467651, -0.02004600502550602], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5401268601417542, -0.021162208169698715], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5622108578681946, -0.02208399958908558], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5840743780136108, -0.02186349220573902], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6045025587081909, -0.02042817696928978], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6233447790145874, -0.018842224031686783], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6414988040924072, -0.018154041841626167], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6597924828529358, -0.018293695524334908], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6781675815582275, -0.018375081941485405], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6964390277862549, -0.018271425738930702], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7145442366600037, -0.018105221912264824], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7303147912025452, -0.015770575031638145], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7436939477920532, -0.013379121199250221], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7556087374687195, -0.011914811097085476], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7659285068511963, -0.010319755412638187], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.773654580116272, -0.007726110052317381], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.778687596321106, -0.005032968241721392], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7830034494400024, -0.00431587640196085], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7865839004516602, -0.003580466378480196], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7884508967399597, -0.0018670061836019158], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7876027226448059, 0.0008481730474159122], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7850154042243958, 0.002587367547675967], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7797186970710754, 0.0052967071533203125], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7737062573432922, 0.006012422498315573], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7659972906112671, 0.007708963006734848], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7566447257995605, 0.009352549910545349], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7447019219398499, 0.011942818760871887], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7322293519973755, 0.012472577393054962], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7182972431182861, 0.013932103291153908], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7040194272994995, 0.014277790673077106], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6885156035423279, 0.01550386007875204], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6728547215461731, 0.015660854056477547], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6551430225372314, 0.017711684107780457], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6365476250648499, 0.018595430999994278], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6171948909759521, 0.0193527489900589], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5981734991073608, 0.01902136020362377], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5796585083007812, 0.018514998257160187], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.56080162525177, 0.018856866285204887], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5407388210296631, 0.02006283961236477], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5215705633163452, 0.019168226048350334], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.503484845161438, 0.018085716292262077], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48656973242759705, 0.01691511832177639], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.469944030046463, 0.01662570610642433], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4537270665168762, 0.016216974705457687], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4381045401096344, 0.015622529201209545], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4231818914413452, 0.014922629110515118], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4080805480480194, 0.015101362019777298], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.39287450909614563, 0.015206040814518929], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3786363899707794, 0.014238100498914719], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.36448612809181213, 0.014150275848805904], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.35248497128486633, 0.012001140974462032], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.342759370803833, 0.009725620970129967], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3354145884513855, 0.007344760466367006], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3294576406478882, 0.005956971552222967], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3258935809135437, 0.0035640408750623465], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3227674067020416, 0.0031261842232197523], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.32106882333755493, 0.0016985831316560507], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3198477327823639, 0.0012210874119773507], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3210728168487549, -0.0012250767322257161], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3247338831424713, -0.003661075606942177], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3308294415473938, -0.006095568183809519], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3383316397666931, -0.007502185646444559], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.34616410732269287, -0.007832463830709457], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3553199768066406, -0.009155881591141224], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3656858801841736, -0.010365903377532959], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3782557249069214, -0.012569841928780079], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3919236660003662, -0.013667936436831951], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4066176116466522, -0.014693958684802055], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4232385456562042, -0.01662091724574566], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.44062361121177673, -0.01738506741821766], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.459701806306839, -0.01907818578183651], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.48028603196144104, -0.020584244281053543], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5022544860839844, -0.021968428045511246], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5254397988319397, -0.023185359314084053], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.548629105091095, -0.023189257830381393], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5726872086524963, -0.024058133363723755], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5974510312080383, -0.024763835594058037], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6216883659362793, -0.024237319827079773], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.645261824131012, -0.023573484271764755], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6669955253601074, -0.021733684465289116], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6877352595329285, -0.020739730447530746], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.707366406917572, -0.019631143659353256], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7247406840324402, -0.017374251037836075], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7407162189483643, -0.015975531190633774], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7551852464675903, -0.014469083398580551], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7681260108947754, -0.012940741144120693], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.779404878616333, -0.01127886027097702], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7879487872123718, -0.008543896488845348], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7947526574134827, -0.006803907919675112], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7997588515281677, -0.00500617315992713], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.8019497394561768, -0.0021909084171056747], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.8033525347709656, -0.0014027844881638885], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.8019341230392456, 0.0014184153405949473], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7996881008148193, 0.0022460310719907284], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7946069240570068, 0.005081195384263992], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7867714166641235, 0.007835503667593002], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7771750688552856, 0.009596336632966995], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7668596506118774, 0.010315393097698689], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7549125552177429, 0.01194709911942482], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7424086332321167, 0.012503966689109802], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7274405360221863, 0.014968070201575756], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.7100947499275208, 0.017345808446407318], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6924359202384949, 0.017658784985542297], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.673596203327179, 0.018839722499251366], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6537235379219055, 0.019872702658176422], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6329005360603333, 0.02082298882305622], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.6113160252571106, 0.02158450335264206], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5881142616271973, 0.023201745003461838], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5634810328483582, 0.024633264169096947], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5375595092773438, 0.02592148631811142], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.5105500221252441, 0.027009490877389908], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4826750159263611, 0.0278750229626894], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.45514458417892456, 0.027530424296855927], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.42815732955932617, 0.02698727324604988], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.4019611179828644, 0.026196204125881195], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3766772150993347, 0.02528388984501362], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.3515452444553375, 0.02513197623193264], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.32667064666748047, 0.024874597787857056], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.30227068066596985, 0.024399951100349426], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.28050151467323303, 0.021769186481833458], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.26040002703666687, 0.020101478323340416], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.24210317432880402, 0.0182968620210886], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.2247602641582489, 0.01734289713203907], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.20946234464645386, 0.015297923237085342], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.19626384973526, 0.013198498636484146], action=1, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.1842312514781952, 0.012032595463097095], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.17533086240291595, 0.008900394663214684], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.16965334117412567, 0.005677512381225824], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.16717015206813812, 0.0024831995833665133], action=0, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.16591306030750275, 0.0012570895487442613], action=2, reward=-1.0, info={}, terminal=False, timeout=False), RLTuple(state=[-0.16685937345027924, -0.000946320709772408], action=1, reward=-1.0, info={}, terminal=False, timeout=True)])]

Training an Offline RL Agent

Finally, we use d3rlpy to train an offline RL agent on this environment. Note that we do not train it to convergence, we only show how to get from a PyTupli dataset to actually doing offline RL! Our TupliDataset has a method for converting all episodes into numpy arrays for states, actions, rewards, terminals, and timeouts. This can be customized if other output formats are required. Using these arrays, we can then create an MDPDataset, which is the required input format for all d3rlpy algorithms.

[14]:
obs, act, rew, terminal, truncated = mdp_dataset.convert_to_tensors(parser=NumpyTupleParser)
# create d3rlpy dataset
d3rlpy_dataset = MDPDataset(
    observations=obs, actions=act, rewards=rew, terminals=terminal, timeouts=truncated
)
2026-04-15 10:35.38 [info     ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) observation_signature=Signature(dtype=[dtype('float32')], shape=[(2,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2026-04-15 10:35.38 [info     ] Action-space has been automatically determined. action_space=<ActionSpace.DISCRETE: 2>
2026-04-15 10:35.38 [info     ] Action size has been automatically determined. action_size=3

Finally, let us show that training an agent using conservative Q-learning (CQL) works with this data:

[ ]:
# algorithm for offline training: CQL from d3rlpy
d3rlpy.seed(1)  # for reproducibility
algo = DiscreteCQLConfig(batch_size=64, alpha=2.0, target_update_interval=1000).create(device='cpu')
# train
algo.fit(dataset=d3rlpy_dataset, n_steps=10000, n_steps_per_epoch=100)

Storing and Retrieving Policy Parameters as Artifacts

Now let’s store the trained policy parameters as an artifact associated with our benchmark. This demonstrates how to link artifacts to specific benchmarks for better organization and retrieval.

[17]:
# Save the model to a temporary file
with tempfile.NamedTemporaryFile(suffix='.pt', delete=False) as temp_file:
    temp_path = temp_file.name

# Save the model using d3rlpy's save_model method
algo.save_model(temp_path)

# Read the file content as bytes
with open(temp_path, 'rb') as f:
    policy_artifact = f.read()

# Clean up the temporary file
os.unlink(temp_path)

# Create metadata for the artifact, linking it to our benchmark
policy_metadata = ArtifactMetadata(
    name='trained_cql_policy',
    description='Trained CQL policy parameters for MountainCar environment',
    benchmark_id=loaded_tupli_env.id,  # Link artifact to the benchmark
)

# Store the artifact
stored_policy = storage.store_artifact(artifact=policy_artifact, metadata=policy_metadata)
print(f'Stored policy artifact with ID: {stored_policy.id}')
print(f'Artifact linked to benchmark: {stored_policy.benchmark_id}')
Stored policy artifact with ID: ad2b9b94163545989604a22f3af1a3de
Artifact linked to benchmark: ca2fd26d462d465d8ee41d64b042c2a1

Another collaborator could now retrieve this artifact by listing all artifacts associated to our benchmark and downloading it.

[18]:
# Create a filter to find artifacts associated with our benchmark
benchmark_filter = FilterEQ(key='benchmark_id', value=loaded_tupli_env.id)

# List all artifacts associated with this benchmark
benchmark_artifacts = storage.list_artifacts(filter=benchmark_filter)

print(
    f'Found {len(benchmark_artifacts)} artifacts associated with benchmark {loaded_tupli_env.id}:'
)
for artifact in benchmark_artifacts:
    print(f'  - ID: {artifact.id}')
    print(f'    Name: {artifact.name}')
    print(f'    Description: {artifact.description}')
    print(f'    Benchmark ID: {artifact.benchmark_id}')
    print(f'    Created: {artifact.created_at}')
    print()
Found 1 artifacts associated with benchmark ca2fd26d462d465d8ee41d64b042c2a1:
  - ID: ad2b9b94163545989604a22f3af1a3de
    Name: trained_cql_policy
    Description: Trained CQL policy parameters for MountainCar environment
    Benchmark ID: ca2fd26d462d465d8ee41d64b042c2a1
    Created: 2026-04-15T10:38:55.857725

Finally, we demonstrate deserialization of the stored policy.

[19]:
# Load the policy artifact
loaded_policy_artifact = storage.load_artifact(stored_policy.id)

# Write the bytes to a temporary file
with tempfile.NamedTemporaryFile(suffix='.pt', delete=False) as temp_file:
    temp_path = temp_file.name
    temp_file.write(loaded_policy_artifact)

# Create a new algorithm instance and load the model
loaded_algo = DiscreteCQLConfig().create(device='cpu')
loaded_algo.build_with_env(loaded_tupli_env)
loaded_algo.load_model(temp_path)

print('Successfully loaded trained CQL policy!')
Successfully loaded trained CQL policy!

Testing the Trained Policy

Now, let us test the trained (and loaded) policy:

[20]:
# activate rendering
setattr(loaded_tupli_env.unwrapped, 'render_mode', 'human')
# deactivate recording of episodes
loaded_tupli_env.deactivate_recording()
# run the environment
np.random.seed(seed=42)
obs, info = loaded_tupli_env.reset(seed=42)

for step in range(800):
    action = np.int64(loaded_algo.predict(np.expand_dims(obs, axis=0))[0])
    obs, reward, done, truncated, info = loaded_tupli_env.step(action)
    if done or truncated:
        print(f'Episode finished after {step + 1} timesteps')
        obs, info = loaded_tupli_env.reset()
# deactivate rendering
loaded_tupli_env.close()
/home/hannah/anaconda3/envs/pytupli_env/lib/python3.11/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists
Episode finished after 375 timesteps
Episode finished after 694 timesteps

The trained policy manages to reach the flag even though it has only learned from random actions!

Deleting Benchmarks

To clean up our storage, we now delete the benchmark and all related artifacts. Episodes will automatically be deleted, too.

[21]:
# Clean up: First delete the policy artifact we created
print(f'Deleting policy artifact: {stored_policy.id}')
storage.delete_artifact(stored_policy.id)

# Then delete the benchmark and all remaining related artifacts
# Episodes will automatically be deleted too
print(f'Deleting benchmark: {loaded_tupli_env.id}')
loaded_tupli_env.delete(delete_artifacts=True)

print('Cleanup completed!')
Deleting policy artifact: ad2b9b94163545989604a22f3af1a3de
Deleting benchmark: ca2fd26d462d465d8ee41d64b042c2a1
Cleanup completed!
[ ]: