ReplayBuffers¶

All replay_buffers are derived from nnabla_rl.models.ReplayBuffer

ReplayBuffer¶

class nnabla_rl.replay_buffer.ReplayBuffer(capacity: int | None = None)[source]¶

append(experience: Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]])[source]¶

Add new experience to the replay buffer.

Parameters:: experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

append_all(experiences: Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]])[source]¶

Add list of experiences to the replay buffer.

Parameters:: experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

property capacity: int | None¶: Capacity (max length) of this replay buffer otherwise None.

sample(num_samples: int = 1, num_steps: int = 1) → Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]] | Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]], ...], Dict[str, Any]][source]¶

Randomly sample num_samples experiences from the replay buffer.

Parameters:

num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.
num_steps (int) – Number of timesteps to sample. Should be greater than 0. Defaults to 1.

Returns:

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps: which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type:

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises:

ValueError – num_samples exceeds the maximum possible index or num_steps is 0 or negative.

Notes

Sampling strategy depends on undelying implementation.

sample_indices(indices: Sequence[int], num_steps: int = 1) → Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]] | Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]], ...], Dict[str, Any]][source]¶

Sample experiences for given indices from the replay buffer.

Parameters:

indices (array-like) – list of array index to sample the data
num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.

Returns:

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps: which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type:

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises:

ValueError – If indices are empty or num_steps is 0 or negative.

List of ReplayBuffer¶

class nnabla_rl.replay_buffers.DecorableReplayBuffer(capacity, decor_fun)[source]¶

Bases: ReplayBuffer

Buffer which can decorate the experience with external decoration function.

This buffer enables decorating the experience before the item is used for building the batch. Decoration function will be called when __getitem__ is called. You can use this buffer to augment the data or add noise to the experience.

class nnabla_rl.replay_buffers.HindsightReplayBuffer(reward_function: Callable[[ndarray, ndarray, Dict[str, Any]], Any], hindsight_prob: float = 0.8, capacity: int | None = None)[source]¶

Bases: ReplayBuffer

append(experience: Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]])[source]¶

Add new experience to the replay buffer.

Parameters:: experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

sample_indices(indices: Sequence[int], num_steps: int = 1) → Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]], Dict[str, Any]][source]¶

Sample experiences for given indices from the replay buffer.

Parameters:

indices (array-like) – list of array index to sample the data
num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.

Returns:

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps: which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type:

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises:

ValueError – If indices are empty or num_steps is 0 or negative.

class nnabla_rl.replay_buffers.MemoryEfficientAtariBuffer(capacity: int, stacked_frames: int = 4)[source]¶

Bases: ReplayBuffer

Buffer designed to compactly save experiences of Atari environments used in DQN.

DQN (and other training algorithms) requires large replay buffer when training on Atari games. If you naively save the experiences, you’ll need more than 100GB to save them (assuming 1M experiences). Which usually does not fit in the machine’s memory (unless you have money:). This replay buffer reduces the size of experience by casting the images to uint8 and removing old frames concatenated to the observation. By using this buffer, you can hold 1M experiences using only 20GB(approx.) of memory. Note that this class is designed only for DQN style training on atari environment. (i.e. State consists of “stacked_frames” number of concatenated grayscaled frames and its values are normalized between 0 and 1)

append(experience)[source]¶

Add new experience to the replay buffer.

Parameters:: experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

class nnabla_rl.replay_buffers.MemoryEfficientAtariTrajectoryBuffer(num_trajectories=None)[source]¶

Bases: TrajectoryReplayBuffer

sample(num_samples: int = 1, num_steps: int = 1)[source]¶

Randomly sample num_samples experiences from the replay buffer.

Parameters:

num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.
num_steps (int) – Number of timesteps to sample. Should be greater than 0. Defaults to 1.

Returns:

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps: which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type:

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises:

ValueError – num_samples exceeds the maximum possible index or num_steps is 0 or negative.

Notes

Sampling strategy depends on undelying implementation.

sample_indices(indices: Sequence[int], num_steps: int = 1)[source]¶

Sample experiences for given indices from the replay buffer.

Parameters:

indices (array-like) – list of array index to sample the data
num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.

Returns:

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps: which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type:

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises:

ValueError – If indices are empty or num_steps is 0 or negative.

class nnabla_rl.replay_buffers.PrioritizedReplayBuffer(capacity: int, alpha: float = 0.6, beta: float = 0.4, betasteps: int = 10000, error_clip: Tuple[float, float] | None = (-1, 1), epsilon: float = 1e-08, reset_segment_interval: int = 1000, sort_interval: int = 1000000, variant: str = 'proportional')[source]¶

Bases: ReplayBuffer

append(experience)[source]¶

Add new experience to the replay buffer.

Parameters:: experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

append_all(experiences)[source]¶

Add list of experiences to the replay buffer.

Parameters:: experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

property capacity¶: Capacity (max length) of this replay buffer otherwise None.

sample(num_samples: int = 1, num_steps: int = 1)[source]¶

Randomly sample num_samples experiences from the replay buffer.

Parameters:

num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.
num_steps (int) – Number of timesteps to sample. Should be greater than 0. Defaults to 1.

Returns:

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps: which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type:

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises:

ValueError – num_samples exceeds the maximum possible index or num_steps is 0 or negative.

Notes

Sampling strategy depends on undelying implementation.

sample_indices(indices: Sequence[int], num_steps: int = 1)[source]¶

Sample experiences for given indices from the replay buffer.

Parameters:

indices (array-like) – list of array index to sample the data
num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.

Returns:

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps: which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type:

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises:

ValueError – If indices are empty or num_steps is 0 or negative.

class nnabla_rl.replay_buffers.TrajectoryReplayBuffer(num_trajectories=None)[source]¶

Bases: ReplayBuffer

TrajectoryReplayBuffer.

Enables appending/sampling not just an experience from the buffer but also a trajectory. In order to append/sample trajectory, the environment must be finite horizon setting.

append(experience: Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]])[source]¶

Add new experience to the replay buffer.

Parameters:: experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

append_all(experiences: Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]])[source]¶

Add list of experiences to the replay buffer.

Parameters:: experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

sample_indices(indices: Sequence[int], num_steps: int = 1) → Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]] | Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]], ...], Dict[str, Any]][source]¶

Sample experiences for given indices from the replay buffer.

Parameters:

indices (array-like) – list of array index to sample the data
num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.

Returns:

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps: which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type:

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises:

ValueError – If indices are empty or num_steps is 0 or negative.

sample_indices_portion(indices: Sequence[int], portion_length: int = 1) → Tuple[Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]], ...], Dict[str, Any]][source]¶

Sample trajectory portions from the buffer. (i.e. Each trajectory length will be portion_length) Trajectory from given index to index+portion_length-1 will be sampled. Index should be the index of a experience in the buffer and not trajectory’s index. For example, if this buffer has 10 trajectories which consist of 5 experiences each, then: index 0: first experience of the first trajectory. index 1: second experience of the first trajectory. index 5: first experience of the second trajectory. index 6: second experience of the second trajectory. If index + portion_length exceeds the length of a trajectory, this will sample.

from len(trajectory) - portion_length to len(trajectory) - 1

Parameters:

indices (int) – Indices of the experience to sample from the replay buffer.
portion_length (int) – Length of each sampled trajectory. Defaults to 1.

Returns:

Randomly sampled num_samples of trajectories with given portion_length. info (Dict[str, Any]): dictionary of information about trajectories.

Return type:

trajectories (Tuple[Trajectory, …])

Raises:

RuntimeError – Trajectory’s length is below portion_length.

sample_trajectories(num_samples: int = 1) → Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]] | Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]], ...], Dict[str, Any]][source]¶

Randomly sample num_samples trajectories from the replay buffer.

Parameters:: num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.
Returns:: Randomly sampled num_samples of trajectories. info (Dict[str, Any]): dictionary of information about trajectories.
Return type:: trajectories (Tuple[Trajectory, …])
Raises:: ValueError – num_samples exceeds the maximum possible trajectories or num_steps is 0 or negative.

sample_trajectories_portion(num_samples: int = 1, portion_length: int = 1) → Tuple[Tuple[Sequence[Tuple[ndarray | Tuple[ndarray, ...], ndarray, float | ndarray, float, ndarray | Tuple[ndarray, ...], Dict[str, Any]]], ...], Dict[str, Any]][source]¶

Randomly sample num_samples trajectories with length portion_length from the replay buffer. (i.e. Each trajectory length will be portion_length) Trajectory will be sampled as follows. First, a trajectory will be sampled with probablity proportional to its length.

Then, a random initial index between 0 to len(sampled_trajectory) - portion_length will be sampled. Finally, a portion_length size trajectory starting from sampled intial index will be returned.

Parameters:

num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.
portion_length (int) – Length of each sampled trajectory. Defaults to 1.

Returns:

Randomly sampled num_samples of trajectories with given portion_length. info (Dict[str, Any]): dictionary of information about trajectories.

Return type:

trajectories (Tuple[Trajectory, …])

Raises:

ValueError – num_samples exceeds the maximum possible trajectories or num_steps is 0 or negative.
RuntimeError – Trajectory’s length is below portion_length.