ReplayBuffers

All replay_buffers are derived from nnabla_rl.models.ReplayBuffer

ReplayBuffer

class nnabla_rl.replay_buffer.ReplayBuffer(capacity: Optional[int] = None)[source]
append(experience: Tuple[Union[ndarray, Tuple[ndarray, ...]], ndarray, float, float, Union[ndarray, Tuple[ndarray, ...]], Dict[str, Any]])[source]

Add new experience to the replay buffer.

Parameters

experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

append_all(experiences: Sequence[Tuple[Union[ndarray, Tuple[ndarray, ...]], ndarray, float, float, Union[ndarray, Tuple[ndarray, ...]], Dict[str, Any]]])[source]

Add list of experiences to the replay buffer.

Parameters

experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

property capacity: Optional[int]

Capacity (max length) of this replay buffer otherwise None

sample(num_samples: int = 1, num_steps: int = 1) Tuple[Union[Sequence[Tuple[Union[ndarray, Tuple[ndarray, ...]], ndarray, float, float, Union[ndarray, Tuple[ndarray, ...]], Dict[str, Any]]], Tuple[Sequence[Tuple[Union[ndarray, Tuple[ndarray, ...]], ndarray, float, float, Union[ndarray, Tuple[ndarray, ...]], Dict[str, Any]]], ...]], Dict[str, Any]][source]

Randomly sample num_samples experiences from the replay buffer.

Parameters
  • num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.

  • num_steps (int) – Number of timesteps to sample. Should be greater than 0. Defaults to 1.

Returns

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps

which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises

ValueError – num_samples exceeds the maximum possible index or num_steps is 0 or negative.

Notes

Sampling strategy depends on undelying implementation.

sample_indices(indices: Sequence[int], num_steps: int = 1) Tuple[Union[Sequence[Tuple[Union[ndarray, Tuple[ndarray, ...]], ndarray, float, float, Union[ndarray, Tuple[ndarray, ...]], Dict[str, Any]]], Tuple[Sequence[Tuple[Union[ndarray, Tuple[ndarray, ...]], ndarray, float, float, Union[ndarray, Tuple[ndarray, ...]], Dict[str, Any]]], ...]], Dict[str, Any]][source]

Sample experiences for given indices from the replay buffer.

Parameters
  • indices (array-like) – list of array index to sample the data

  • num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.

Returns

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps

which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises

ValueError – If indices are empty or num_steps is 0 or negative.

List of ReplayBuffer

class nnabla_rl.replay_buffers.DecorableReplayBuffer(capacity, decor_fun)[source]

Bases: ReplayBuffer

Buffer which can decorate the experience with external decoration function

This buffer enables decorating the experience before the item is used for building the batch. Decoration function will be called when __getitem__ is called. You can use this buffer to augment the data or add noise to the experience.

class nnabla_rl.replay_buffers.HindsightReplayBuffer(reward_function: Callable[[ndarray, ndarray, Dict[str, Any]], Any], hindsight_prob: float = 0.8, capacity: Optional[int] = None)[source]

Bases: ReplayBuffer

append(experience: Tuple[Union[ndarray, Tuple[ndarray, ...]], ndarray, float, float, Union[ndarray, Tuple[ndarray, ...]], Dict[str, Any]])[source]

Add new experience to the replay buffer.

Parameters

experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

sample_indices(indices: Sequence[int], num_steps: int = 1) Tuple[Sequence[Tuple[Union[ndarray, Tuple[ndarray, ...]], ndarray, float, float, Union[ndarray, Tuple[ndarray, ...]], Dict[str, Any]]], Dict[str, Any]][source]

Sample experiences for given indices from the replay buffer.

Parameters
  • indices (array-like) – list of array index to sample the data

  • num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.

Returns

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps

which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises

ValueError – If indices are empty or num_steps is 0 or negative.

class nnabla_rl.replay_buffers.MemoryEfficientAtariBuffer(capacity: int, stacked_frames: int = 4)[source]

Bases: ReplayBuffer

Buffer designed to compactly save experiences of Atari environments used in DQN. DQN (and other training algorithms) requires large replay buffer when training on Atari games. If you naively save the experiences, you’ll need more than 100GB to save them (assuming 1M experiences). Which usually does not fit in the machine’s memory (unless you have money:). This replay buffer reduces the size of experience by casting the images to uint8 and removing old frames concatenated to the observation. By using this buffer, you can hold 1M experiences using only 20GB(approx.) of memory. Note that this class is designed only for DQN style training on atari environment. (i.e. State consists of “stacked_frames” number of concatenated grayscaled frames and its values are normalized between 0 and 1)

append(experience)[source]

Add new experience to the replay buffer.

Parameters

experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

class nnabla_rl.replay_buffers.PrioritizedReplayBuffer(capacity: int, alpha: float = 0.6, beta: float = 0.4, betasteps: int = 10000, error_clip: Optional[Tuple[float, float]] = (-1, 1), epsilon: float = 1e-08, reset_segment_interval: int = 1000, sort_interval: int = 1000000, variant: str = 'proportional')[source]

Bases: ReplayBuffer

append(experience)[source]

Add new experience to the replay buffer.

Parameters

experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

append_all(experiences)[source]

Add list of experiences to the replay buffer.

Parameters

experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

property capacity

Capacity (max length) of this replay buffer otherwise None

sample(num_samples: int = 1, num_steps: int = 1)[source]

Randomly sample num_samples experiences from the replay buffer.

Parameters
  • num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.

  • num_steps (int) – Number of timesteps to sample. Should be greater than 0. Defaults to 1.

Returns

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps

which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises

ValueError – num_samples exceeds the maximum possible index or num_steps is 0 or negative.

Notes

Sampling strategy depends on undelying implementation.

sample_indices(indices: Sequence[int], num_steps: int = 1)[source]

Sample experiences for given indices from the replay buffer.

Parameters
  • indices (array-like) – list of array index to sample the data

  • num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.

Returns

Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps

which contains num_samples of experiences for each entry.

info (Dict[str, Any]): dictionary of information about experiences.

Return type

experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])

Raises

ValueError – If indices are empty or num_steps is 0 or negative.