Environment explorers¶
All explorers are derived from nnabla_rl.environment_explorer.EnvironmentExplorer
.
EnvironmentExplorer¶
- class nnabla_rl.environment_explorer.EnvironmentExplorerConfig(warmup_random_steps: int = 0, reward_scalar: float = 1.0, timelimit_as_terminal: bool = True, initial_step_num: int = 0)[source]¶
- class nnabla_rl.environment_explorer.EnvironmentExplorer(env_info: nnabla_rl.environments.environment_info.EnvironmentInfo, config: nnabla_rl.environment_explorer.EnvironmentExplorerConfig = EnvironmentExplorerConfig(warmup_random_steps=0, reward_scalar=1.0, timelimit_as_terminal=True, initial_step_num=0))[source]¶
Base class for environment exploration methods.
- abstract action(steps: int, state: numpy.ndarray) numpy.ndarray [source]¶
Compute the action for given state at given timestep
- Parameters
steps (int) – timesteps since the beginning of exploration
state (np.ndarray) – current state to compute the action
- Returns
action for current state at given timestep
- Return type
np.ndarray
- rollout(env: gym.core.Env) List[Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]]] [source]¶
Rollout the episode in current env
- Parameters
env (gym.Env) – Environment
- Returns
- List of experience.
Experience consists of (state, action, reward, terminal flag, next state and extra info).
- Return type
List[Experience]
- step(env: gym.core.Env, n: int = 1, break_if_done: bool = False) List[Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]]] [source]¶
Step n timesteps in given env
- Parameters
env (gym.Env) – Environment
n (int) – Number of timesteps to act in the environment
- Returns
- List of experience.
Experience consists of (state, action, reward, terminal flag, next state and extra info).
- Return type
List[Experience]
LinearDecayEpsilonGreedyExplorer¶
- class nnabla_rl.environment_explorers.LinearDecayEpsilonGreedyExplorer(greedy_action_selector: Callable[[numpy.ndarray], Tuple[numpy.ndarray, Dict]], random_action_selector: Callable[[numpy.ndarray], Tuple[numpy.ndarray, Dict]], env_info: nnabla_rl.environments.environment_info.EnvironmentInfo, config: nnabla_rl.environment_explorers.epsilon_greedy_explorer.LinearDecayEpsilonGreedyExplorerConfig = LinearDecayEpsilonGreedyExplorerConfig(warmup_random_steps=0, reward_scalar=1.0, timelimit_as_terminal=True, initial_step_num=0, initial_epsilon=1.0, final_epsilon=0.05, max_explore_steps=1000000))[source]¶
Linear decay epsilon-greedy explorer
Epsilon-greedy style explorer. Epsilon is linearly decayed until max_eplore_steps set in the config.
- Parameters
greedy_action_selector (Callable[[np.ndarray], Tuple[np.ndarray, Dict]]) – callable which computes greedy action with respect to current state.
random_action_selector (Callable[[np.ndarray], Tuple[np.ndarray, Dict]]) – callable which computes random action that can be executed in the environment.
env_info (
EnvironmentInfo
) – environment infoconfig (
LinearDecayEpsilonGreedyExplorerConfig
) – the config of this class.
GaussianExplorer¶
- class nnabla_rl.environment_explorers.GaussianExplorer(policy_action_selector: Callable[[numpy.ndarray], Tuple[numpy.ndarray, Dict]], env_info: nnabla_rl.environments.environment_info.EnvironmentInfo, config: nnabla_rl.environment_explorers.gaussian_explorer.GaussianExplorerConfig = GaussianExplorerConfig(warmup_random_steps=0, reward_scalar=1.0, timelimit_as_terminal=True, initial_step_num=0, action_clip_low=2.2250738585072014e-308, action_clip_high=1.7976931348623157e+308, sigma=1.0))[source]¶
Gaussian explorer
Explore using policy’s action without gaussian noise appended to it. Policy’s action must be continuous action.
- Parameters
policy_action_selector (Callable[[np.ndarray], Tuple[np.ndarray, Dict]]) – callable which computes current policy’s action with respect to current state.
env_info (
EnvironmentInfo
) – environment infoconfig (
LinearDecayEpsilonGreedyExplorerConfig
) – the config of this class.
RawPolicyExplorer¶
- class nnabla_rl.environment_explorers.RawPolicyExplorer(policy_action_selector: Callable[[numpy.ndarray], Tuple[numpy.ndarray, Dict]], env_info: nnabla_rl.environments.environment_info.EnvironmentInfo, config: nnabla_rl.environment_explorers.raw_policy_explorer.RawPolicyExplorerConfig = RawPolicyExplorerConfig(warmup_random_steps=0, reward_scalar=1.0, timelimit_as_terminal=True, initial_step_num=0))[source]¶
Raw policy explorer
Explore using policy’s action without any changes.
- Parameters
policy_action_selector (Callable[[np.ndarray], Tuple[np.ndarray, Dict]]) – callable which computes current policy’s action with respect to current state.
env_info (
EnvironmentInfo
) – environment infoconfig (
LinearDecayEpsilonGreedyExplorerConfig
) – the config of this class.