ReplayBuffers¶
All replay_buffers are derived from nnabla_rl.models.ReplayBuffer
ReplayBuffer¶
- class nnabla_rl.replay_buffer.ReplayBuffer(capacity: Optional[int] = None)[source]¶
- append(experience: Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]])[source]¶
Add new experience to the replay buffer.
- Parameters
experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)
Notes
If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.
- append_all(experiences: Sequence[Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]]])[source]¶
Add list of experiences to the replay buffer.
- Parameters
experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer
Notes
If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.
- property capacity: Optional[int]¶
Capacity (max length) of this replay buffer otherwise None
- sample(num_samples: int = 1, num_steps: int = 1) Tuple[Union[Sequence[Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]]], Tuple[Sequence[Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]]], ...]], Dict[str, Any]] [source]¶
Randomly sample num_samples experiences from the replay buffer.
- Parameters
num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.
num_steps (int) – Number of timesteps to sample. Should be greater than 0. Defaults to 1.
- Returns
- Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps
which contains num_samples of experiences for each entry.
info (Dict[str, Any]): dictionary of information about experiences.
- Return type
experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])
- Raises
ValueError – num_samples exceeds the maximum possible index or num_steps is 0 or negative.
Notes
Sampling strategy depends on undelying implementation.
- sample_indices(indices: Sequence[int], num_steps: int = 1) Tuple[Union[Sequence[Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]]], Tuple[Sequence[Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]]], ...]], Dict[str, Any]] [source]¶
Sample experiences for given indices from the replay buffer.
- Parameters
indices (array-like) – list of array index to sample the data
num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.
- Returns
- Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps
which contains num_samples of experiences for each entry.
info (Dict[str, Any]): dictionary of information about experiences.
- Return type
experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])
- Raises
ValueError – If indices are empty or num_steps is 0 or negative.
List of ReplayBuffer¶
- class nnabla_rl.replay_buffers.DecorableReplayBuffer(capacity, decor_fun)[source]¶
Bases:
nnabla_rl.replay_buffer.ReplayBuffer
Buffer which can decorate the experience with external decoration function
This buffer enables decorating the experience before the item is used for building the batch. Decoration function will be called when __getitem__ is called. You can use this buffer to augment the data or add noise to the experience.
- class nnabla_rl.replay_buffers.HindsightReplayBuffer(reward_function: Callable[[numpy.ndarray, numpy.ndarray, Dict[str, Any]], Any], hindsight_prob: float = 0.8, capacity: Optional[int] = None)[source]¶
Bases:
nnabla_rl.replay_buffer.ReplayBuffer
- append(experience: Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]])[source]¶
Add new experience to the replay buffer.
- Parameters
experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)
Notes
If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.
- sample_indices(indices: Sequence[int], num_steps: int = 1) Tuple[Sequence[Tuple[Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], numpy.ndarray, float, float, Union[numpy.ndarray, Tuple[numpy.ndarray, ...]], Dict[str, Any]]], Dict[str, Any]] [source]¶
Sample experiences for given indices from the replay buffer.
- Parameters
indices (array-like) – list of array index to sample the data
num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.
- Returns
- Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps
which contains num_samples of experiences for each entry.
info (Dict[str, Any]): dictionary of information about experiences.
- Return type
experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])
- Raises
ValueError – If indices are empty or num_steps is 0 or negative.
- class nnabla_rl.replay_buffers.MemoryEfficientAtariBuffer(capacity: int)[source]¶
Bases:
nnabla_rl.replay_buffer.ReplayBuffer
Buffer designed to compactly save experiences of Atari environments used in DQN. DQN (and other training algorithms) requires large replay buffer when training on Atari games. If you naively save the experiences, you’ll need more than 100GB to save them (assuming 1M experiences). Which usually does not fit in the machine’s memory (unless you have money:). This replay buffer reduces the size of experience by casting the images to uint8 and removing old frames concatenated to the observation. By using this buffer, you can hold 1M experiences using only 20GB(approx.) of memory. Note that this class is designed only for DQN style training on atari environment. (i.e. State consists of 4 concatenated grayscaled frames and its values are normalized between 0 and 1)
- append(experience)[source]¶
Add new experience to the replay buffer.
- Parameters
experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)
Notes
If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.
- class nnabla_rl.replay_buffers.PrioritizedReplayBuffer(capacity: int, alpha: float = 0.6, beta: float = 0.4, betasteps: int = 10000, error_clip: Optional[Tuple[float, float]] = (- 1, 1), epsilon: float = 1e-08, reset_segment_interval: int = 1000, sort_interval: int = 1000000, variant: str = 'proportional')[source]¶
Bases:
nnabla_rl.replay_buffer.ReplayBuffer
- append(experience)[source]¶
Add new experience to the replay buffer.
- Parameters
experience (array-like) – Experience includes trainsitions, such as state, action, reward, the iteration of environment has done or not. Please see to get more information in [Replay buffer documents](replay_buffer.md)
Notes
If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.
- append_all(experiences)[source]¶
Add list of experiences to the replay buffer.
- Parameters
experiences (Sequence[Experience]) – Sequence of experiences to insert to the buffer
Notes
If the replay buffer size is full, the oldest (head of the buffer) experience will be dropped off and the given experince will be added to the tail of the buffer.
- property capacity¶
Capacity (max length) of this replay buffer otherwise None
- sample(num_samples: int = 1, num_steps: int = 1)[source]¶
Randomly sample num_samples experiences from the replay buffer.
- Parameters
num_samples (int) – Number of samples to sample from the replay buffer. Defaults to 1.
num_steps (int) – Number of timesteps to sample. Should be greater than 0. Defaults to 1.
- Returns
- Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps
which contains num_samples of experiences for each entry.
info (Dict[str, Any]): dictionary of information about experiences.
- Return type
experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])
- Raises
ValueError – num_samples exceeds the maximum possible index or num_steps is 0 or negative.
Notes
Sampling strategy depends on undelying implementation.
- sample_indices(indices: Sequence[int], num_steps: int = 1)[source]¶
Sample experiences for given indices from the replay buffer.
- Parameters
indices (array-like) – list of array index to sample the data
num_steps (int) – Number of timesteps to sample. Should not be negative. Defaults to 1.
- Returns
- Random num_samples of experiences. If num_steps is greater than 1, will return a tuple of size num_steps
which contains num_samples of experiences for each entry.
info (Dict[str, Any]): dictionary of information about experiences.
- Return type
experiences (Sequence[Experience] or Tuple[Sequence[Experience], …])
- Raises
ValueError – If indices are empty or num_steps is 0 or negative.