pytupli.benchmark.TupliEnvWrapper

class TupliEnvWrapper(env: ~gymnasium.core.Env, storage: ~pytupli.storage.TupliStorage, benchmark_id: str | None = None, metadata_callback: ~pytupli.schema.EpisodeMetadataCallback | None = None, rl_tuple_cls: type[~pytupli.schema.RLTuple] = <class 'pytupli.schema.RLTuple'>)[source]

Bases: Wrapper

A wrapper for Gymnasium environments that enables serialization and deserialization with the goal of creating reproducible benchmarks from environments. It handles the interface to the storage backend, including storing, loading, and publishing benchmarks. Enables users to record interactions with gymnasium environments to the storage such that they can be used as datasets for offline RL.

Parameters:

env (Env) – The Gymnasium environment to wrap
storage (TupliStorage) – Storage backend for saving benchmark and episode data
benchmark_id (str | None) – Identifier for the benchmark. Defaults to None
metadata_callback (EpisodeMetadataCallback | None) – Callback for generating
None (episode metadata. Defaults to)
rl_tuple_cls (type[RLTuple]) – Class to use for creating RL tuples. Defaults to RLTuple

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

Methods

`activate_recording`	Activates the recording of environment interactions.
`class_name`	Returns the class name of the wrapper.
`close`	Closes the wrapper and `env`.
`deactivate_recording`	Deactivates the recording of environment interactions.
`delete`	Deletes the benchmark and optionally its related data from storage.
`deserialize_env`	Deserializes a JSON string back into a Gymnasium environment.
`get_wrapper_attr`	Gets an attribute from the wrapper and lower environments if name doesn't exist in this object.
`has_wrapper_attr`	Checks if the given attribute is within the wrapper or its environment.
`load`	Loads a benchmark from storage.
`publish`	Publishes the benchmark, making it available for other users depending on their access rights.
`render`	Uses the `render()` of the `env` that can be overwritten to change the returned data.
`reset`	Resets the environment and returns the initial observation.
`serialize_env`	Serializes a Gymnasium environment to a JSON string.
`set_wrapper_attr`	Sets an attribute on this wrapper or lower environment if name is already defined.
`step`	Takes a step in the environment and optionally records the interaction.
`store`	Stores the benchmark in the storage backend.
`wrapper_spec`	Generates a WrapperSpec for the wrappers.

Attributes

`action_space`	Return the `Env` `action_space` unless overwritten then the wrapper `action_space` is used.
`metadata`	Returns the `Env` `metadata`.
`np_random`	Returns the `Env` `np_random` attribute.
`np_random_seed`	Returns the base environment's `np_random_seed`.
`observation_space`	Return the `Env` `observation_space` unless overwritten then the wrapper `observation_space` is used.
`render_mode`	Returns the `Env` `render_mode`.
`spec`	Returns the `Env` `spec` attribute with the WrapperSpec if the wrapper inherits from EzPickle.
`unwrapped`	Returns the base environment of the wrapper.

classmethod _deserialize(env: Env, storage: TupliStorage) → Env[source]

Internal method for environment deserialization.

This method is meant to be overridden by subclasses to implement custom deserialization behavior, e.g., for artifacts such as csv files or trained models.

Parameters:

env (Env) – The environment to deserialize
storage (TupliStorage) – Storage backend for loading artifacts

Returns:

Env – The deserialized environment

_get_hash(obj: Any) → str[source]

Generates a hash for a given object using JSON serialization.

Parameters:: obj (Any) – The object to hash
Returns:: str – SHA-256 hash of the serialized object

_prepare_storage(metadata: BenchmarkMetadata) → tuple[str, str][source]

Prepares benchmark data for storage.

Parameters:: metadata (BenchmarkMetadata) – Metadata for the benchmark
Returns:: tuple[str, str] – Tuple containing the benchmark hash and serialized environment

_serialize(env: Env) → tuple[Env, list][source]

Internal method for environment serialization.

This method is meant to be overridden by subclasses to implement custom serialization behavior, e.g., for artifacts such as csv files or trained models.

Parameters:: env (Env) – The environment to serialize
Returns:: tuple[Env, list] – Tuple containing the processed environment and list of related artifacts

activate_recording()[source]

Activates the recording of environment interactions.

When active, the wrapper will record tuples of (state, action, reward, etc.) and store them as episodes.

deactivate_recording()[source]

Deactivates the recording of environment interactions.

When deactivated, the wrapper will not record or store any environment interactions.

delete(delete_artifacts: bool = False, delete_episodes: bool = True)[source]

Deletes the benchmark and optionally its related data from storage.

Parameters:

delete_artifacts (bool, optional) – Whether to delete related artifacts. Defaults to False
delete_episodes (bool, optional) – Whether to delete related episodes. Defaults to True

Raises:

Exception – If deletion of benchmark, episodes, or artifacts fails

classmethod deserialize_env(serialized_env: str, storage: TupliStorage) → Env[source]

Deserializes a JSON string back into a Gymnasium environment.

Parameters:

serialized_env (str) – The JSON string representation of the environment
storage (TupliStorage) – Storage backend for loading related artifacts

Returns:

Env – The deserialized Gymnasium environment

classmethod load(storage: ~pytupli.storage.TupliStorage, benchmark_id: str | None = None, metadata_callback: ~pytupli.schema.EpisodeMetadataCallback | None = None, rl_tuple_cls: type[~pytupli.schema.RLTuple] = <class 'pytupli.schema.RLTuple'>) → TupliEnvWrapper[source]

Loads a benchmark from storage.

Parameters:

storage (TupliStorage) – Storage backend to load from
benchmark_id (str | None) – ID of the benchmark to load. Defaults to None
metadata_callback (EpisodeMetadataCallback | None) – Callback for generating episode metadata. Defaults to None

Returns:

TupliEnvWrapper – A new wrapper instance with the loaded benchmark

publish() → None[source]

Publishes the benchmark, making it available for other users depending on their access rights.

This method should be called after storing the benchmark when it’s ready to be used by others.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[Any, dict[str, Any]][source]

Resets the environment and returns the initial observation.

Parameters:

seed (int | None) – Random seed for environment reset. Defaults to None
options (dict[str, Any] | None) – Additional options for reset. Defaults to None

Returns:

tuple[Any, dict[str, Any]] – Initial observation and info dictionary

serialize_env(env: Env) → str[source]

Serializes a Gymnasium environment to a JSON string.

This method handles the serialization of the environment and any related artifacts.

Parameters:: env (Env) – The environment to serialize
Returns:: str – JSON string representation of the environment

step(action: Any) → tuple[Any, SupportsFloat, bool, bool, dict[str, Any]][source]

Takes a step in the environment and optionally records the interaction.

If recording is active, stores the interaction in the tuple buffer and creates an episode when the episode terminates.

Parameters:

action (Any) – The action to take in the environment

Returns:

tuple[Any, SupportsFloat, bool, bool, dict[str, Any]] –

Tuple containing:

observation: The environment observation
reward: The reward for the action
terminated: Whether the episode terminated naturally
truncated: Whether the episode was artificially terminated
info: Additional information from the environment

store(name: str, description: str = '', difficulty: str | None = None, version: str | None = None, metadata: dict[str, Any] = {}) → str[source]

Stores the benchmark in the storage backend.

Parameters:

name (str) – Name of the benchmark
description (str, optional) – Description of the benchmark. Defaults to ‘’
difficulty (str | None, optional) – Difficulty level of the benchmark. Defaults to None
version (str | None, optional) – Version of the benchmark. Defaults to None
metadata (dict[str, Any], optional) – Additional metadata. Defaults to {}

Returns:

str – The ID of the stored benchmark