ReplayBuffers¶

All replay_buffers are derived from nnabla_rl.models.ReplayBuffer

ReplayBuffer¶

class nnabla_rl.replay_buffer.ReplayBuffer(capacity: Optional[int] = None)[source]¶

append(experience: Tuple[Type[numpy.array], Type[numpy.array], float, float, Type[numpy.array], Dict[str, Any]])[source]¶

Add new experience to the replay buffer.

Parameters: experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

append_all(experiences: Sequence[Tuple[Type[numpy.array], Type[numpy.array], float, float, Type[numpy.array], Dict[str, Any]]])[source]¶

Add list of experiences to the replay buffer.

Parameters: experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

property capacity: Optional[int]¶: Capacity (max length) of this replay buffer otherwise None

sample(num_samples: int = 1) → Tuple[Sequence[Tuple[Type[numpy.array], Type[numpy.array], float, float, Type[numpy.array], Dict[str, Any]]], Dict[str, Any]][source]¶

Randomly sample num_samples experiences from the replay buffer.

Parameters: num_samples (int) – Number of samples to sample from the replay buffer.
Returns: Random num_samples of experiences. info (Dict[str, Any]): dictionary of information about experiences.
Return type: experiences (Sequence[Experience])

Notes

Sampling strategy depends on the undelying implementation.

sample_indices(indices: Sequence[int]) → Tuple[Sequence[Tuple[Type[numpy.array], Type[numpy.array], float, float, Type[numpy.array], Dict[str, Any]]], Dict[str, Any]][source]¶

Sample experiences for given indices from the replay buffer.

Parameters: indices (array-like) – list of array index to sample the data
Returns: Sample of experiences for given indices.
Return type: experiences (array-like)
Raises: ValueError – If indices are empty

List of ReplayBuffer¶

class nnabla_rl.replay_buffers.DecorableReplayBuffer(capacity, decor_fun)[source]¶

Bases: nnabla_rl.replay_buffer.ReplayBuffer

Buffer which can decorate the experience with external decoration function

This buffer enables decorating the experience before the item is used for building the batch. Decoration function will be called when __getitem__ is called. You can use this buffer to augment the data or add noise to the experience.

class nnabla_rl.replay_buffers.MemoryEfficientAtariBuffer(capacity)[source]¶

Bases: nnabla_rl.replay_buffer.ReplayBuffer

Buffer designed to compactly save experiences of Atari environments used in DQN.

DQN (and other training algorithms) requires large replay buffer when training on Atari games. If you naively save the experiences, you’ll need more than 100GB to save them (assuming 1M experiences). Which usually does not fit in the machine’s memory (unless you have money:). This replay buffer reduces the size of experience by casting the images to uint8 and removing old frames concatenated to the observation. By using this buffer, you can hold 1M experiences using only 20GB(approx.) of memory.

Note that this class is designed only for DQN style training on atari environment. (i.e. State consists of 4 concatenated grayscaled frames and its values are normalized between 0 and 1)

append(experience)[source]¶

Add new experience to the replay buffer.

Parameters: experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

append_all(experiences)[source]¶

Add list of experiences to the replay buffer.

Parameters: experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

class nnabla_rl.replay_buffers.PrioritizedReplayBuffer(capacity, alpha=0.6, beta=0.4, betasteps=10000, epsilon=1e-08)[source]¶

Bases: nnabla_rl.replay_buffer.ReplayBuffer

append(experience)[source]¶

Add new experience to the replay buffer.

Parameters: experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)

Notes

If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.

sample(num_samples=1)[source]¶

Randomly sample num_samples experiences from the replay buffer.

Parameters: num_samples (int) – Number of samples to sample from the replay buffer.
Returns: Random num_samples of experiences. info (Dict[str, Any]): dictionary of information about experiences.
Return type: experiences (Sequence[Experience])

Notes

Sampling strategy depends on the undelying implementation.

sample_indices(indices)[source]¶

Sample experiences for given indices from the replay buffer.

Parameters: indices (array-like) – list of array index to sample the data
Returns: Sample of experiences for given indices.
Return type: experiences (array-like)
Raises: ValueError – If indices are empty