Distributions¶
All probability distributions are derived from nnabla_rl.distributions.Distribution
Distribution¶
- class nnabla_rl.distributions.Distribution[source]¶
- choose_probable() Variable | ndarray [source]¶
Compute the most probable action of the distribution.
- Returns:
Probable action of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- entropy() Variable | ndarray [source]¶
Compute the entropy of the distribution.
- Returns:
Entropy of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- kl_divergence(q: Distribution) Variable | ndarray [source]¶
Compute the kullback leibler divergence between given distribution. This function will compute KL(self||q)
- Parameters:
q (nnabla_rl.distributions.Distribution) – target distribution to compute the kl_divergence
- Returns:
Kullback leibler divergence
- Return type:
Union[nn.Variable, np.ndarray]
- Raises:
ValueError – target distribution’s type does not match with current distribution type.
- log_prob(x: Variable | ndarray) Variable | ndarray [source]¶
Compute the log probability of given input.
- Parameters:
x (Union[nn.Variable, np.ndarray]) – Target value to compute the log probability
- Returns:
Log probability of given input
- Return type:
Union[nn.Variable, np.ndarray]
- mean() Variable | ndarray [source]¶
Compute the mean of the distribution (if exist)
- Returns:
mean of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- Raises:
NotImplementedError – The distribution does not have mean
- property ndim: int¶
The number of dimensions of the distribution.
- abstract sample(noise_clip: Tuple[float, float] | None = None) Variable | ndarray [source]¶
Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value
- Return type:
Union[nn.Variable, np.ndarray]
- sample_and_compute_log_prob(noise_clip: Tuple[float, float] | None = None) Tuple[Variable, Variable] | Tuple[ndarray, ndarray] [source]¶
Sample a value from the distribution and compute its log probability.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value and its log probabilty
- Return type:
Union[Tuple[nn.Variable, nn.Variable], Tuple[np.ndarray, np.ndarray]]
- sample_multiple(num_samples: int, noise_clip: Tuple[float, float] | None = None) Variable | ndarray [source]¶
Sample mutiple value from the distribution New axis will be added between the first and second axis. Thefore, the returned value shape for mean and variance with shape (batch_size, data_shape) will be changed to (batch_size, num_samples, data_shape)
If noise_clip is specified, sampled values will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
num_samples (int) – number of samples per batch
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value.
- Return type:
Union[nn.Variable, np.ndarray]
List of Distributions¶
- class nnabla_rl.distributions.Bernoulli(z)[source]¶
Bases:
DiscreteDistribution
Bernoulli distribution.
\(p^{k}(1-p)^{1-k} \enspace \text{for}\ k\in\{0,1\}\).
- Parameters:
z (nn.Variable) – Probability of outputting 1 is computed as \(p=sigmoid(z)\).
- choose_probable()[source]¶
Compute the most probable action of the distribution.
- Returns:
Probable action of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- entropy()[source]¶
Compute the entropy of the distribution.
- Returns:
Entropy of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- kl_divergence(q)[source]¶
Compute the kullback leibler divergence between given distribution. This function will compute KL(self||q)
- Parameters:
q (nnabla_rl.distributions.Distribution) – target distribution to compute the kl_divergence
- Returns:
Kullback leibler divergence
- Return type:
Union[nn.Variable, np.ndarray]
- Raises:
ValueError – target distribution’s type does not match with current distribution type.
- log_prob(x)[source]¶
Compute the log probability of given input.
- Parameters:
x (Union[nn.Variable, np.ndarray]) – Target value to compute the log probability
- Returns:
Log probability of given input
- Return type:
Union[nn.Variable, np.ndarray]
- mean()[source]¶
Compute the mean of the distribution (if exist)
- Returns:
mean of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- Raises:
NotImplementedError – The distribution does not have mean
- property ndim¶
The number of dimensions of the distribution.
- sample(noise_clip=None)[source]¶
Sample a value from the distribution.
- Parameters:
noise_clip (Tuple[float, float], optional) – Noise clip does nothing in Bernoulli distribution.
- Returns:
Sampled value.
- Return type:
nn.Variable
- sample_and_compute_log_prob(noise_clip=None)[source]¶
Sample a value from the distribution and compute its log probability.
- Parameters:
noise_clip (Tuple[float, float], optional) – Noise clip does nothing in Bernoulli distribution.
- Returns:
Sampled value and its log probabilty
- Return type:
Tuple[nn.Variable, nn.Variable]
- class nnabla_rl.distributions.Gaussian(mean: Variable | ndarray, ln_var: Variable | ndarray)[source]¶
Bases:
ContinuosDistribution
Gaussian distribution.
\(\mathcal{N}(\mu,\,\sigma^{2})\)
- Parameters:
mean (nn.Variable) – mean \(\mu\) of gaussian distribution.
ln_var (nn.Variable) – logarithm of the variance \(\sigma^{2}\). (i.e. ln_var is \(\log{\sigma^{2}}\))
- choose_probable()[source]¶
Compute the most probable action of the distribution.
- Returns:
Probable action of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- entropy()[source]¶
Compute the entropy of the distribution.
- Returns:
Entropy of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- kl_divergence(q)[source]¶
Compute the kullback leibler divergence between given distribution. This function will compute KL(self||q)
- Parameters:
q (nnabla_rl.distributions.Distribution) – target distribution to compute the kl_divergence
- Returns:
Kullback leibler divergence
- Return type:
Union[nn.Variable, np.ndarray]
- Raises:
ValueError – target distribution’s type does not match with current distribution type.
- log_prob(x)[source]¶
Compute the log probability of given input.
- Parameters:
x (Union[nn.Variable, np.ndarray]) – Target value to compute the log probability
- Returns:
Log probability of given input
- Return type:
Union[nn.Variable, np.ndarray]
- mean()[source]¶
Compute the mean of the distribution (if exist)
- Returns:
mean of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- Raises:
NotImplementedError – The distribution does not have mean
- property ndim¶
The number of dimensions of the distribution.
- sample(noise_clip=None)[source]¶
Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value
- Return type:
Union[nn.Variable, np.ndarray]
- sample_and_compute_log_prob(noise_clip=None)[source]¶
Sample a value from the distribution and compute its log probability.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value and its log probabilty
- Return type:
Union[Tuple[nn.Variable, nn.Variable], Tuple[np.ndarray, np.ndarray]]
- sample_multiple(num_samples, noise_clip=None)[source]¶
Sample mutiple value from the distribution New axis will be added between the first and second axis. Thefore, the returned value shape for mean and variance with shape (batch_size, data_shape) will be changed to (batch_size, num_samples, data_shape)
If noise_clip is specified, sampled values will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
num_samples (int) – number of samples per batch
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value.
- Return type:
Union[nn.Variable, np.ndarray]
- class nnabla_rl.distributions.Softmax(z)[source]¶
Bases:
DiscreteDistribution
Softmax distribution which samples a class index \(i\) according to the following probability.
\(i \sim \frac{\exp{z_{i}}}{\sum_{j}\exp{z_{j}}}\).
- Parameters:
z (nn.Variable) – logits \(z\). Logits’ dimension should be same as the number of class to sample.
- choose_probable()[source]¶
Compute the most probable action of the distribution.
- Returns:
Probable action of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- entropy()[source]¶
Compute the entropy of the distribution.
- Returns:
Entropy of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- kl_divergence(q)[source]¶
Compute the kullback leibler divergence between given distribution. This function will compute KL(self||q)
- Parameters:
q (nnabla_rl.distributions.Distribution) – target distribution to compute the kl_divergence
- Returns:
Kullback leibler divergence
- Return type:
Union[nn.Variable, np.ndarray]
- Raises:
ValueError – target distribution’s type does not match with current distribution type.
- log_prob(x)[source]¶
Compute the log probability of given input.
- Parameters:
x (Union[nn.Variable, np.ndarray]) – Target value to compute the log probability
- Returns:
Log probability of given input
- Return type:
Union[nn.Variable, np.ndarray]
- mean()[source]¶
Compute the mean of the distribution (if exist)
- Returns:
mean of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- Raises:
NotImplementedError – The distribution does not have mean
- property ndim¶
The number of dimensions of the distribution.
- sample(noise_clip=None)[source]¶
Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value
- Return type:
Union[nn.Variable, np.ndarray]
- sample_and_compute_log_prob(noise_clip=None)[source]¶
Sample a value from the distribution and compute its log probability.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value and its log probabilty
- Return type:
Union[Tuple[nn.Variable, nn.Variable], Tuple[np.ndarray, np.ndarray]]
- sample_multiple(num_samples, noise_clip=None)[source]¶
Sample mutiple value from the distribution New axis will be added between the first and second axis. Thefore, the returned value shape for mean and variance with shape (batch_size, data_shape) will be changed to (batch_size, num_samples, data_shape)
If noise_clip is specified, sampled values will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
num_samples (int) – number of samples per batch
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value.
- Return type:
Union[nn.Variable, np.ndarray]
- class nnabla_rl.distributions.SquashedGaussian(mean, ln_var)[source]¶
Bases:
ContinuosDistribution
Gaussian distribution which its output is squashed with tanh.
\(z \sim \mathcal{N}(\mu,\,\sigma^{2})\). \(out = \tanh{z}\).
- Parameters:
mean (nn.Variable) – mean \(\mu\) of underlying gaussian distribution.
ln_var (nn.Variable) – logarithm of the variance \(\sigma^{2}\). (i.e. ln_var is \(\log{\sigma^{2}}\))
Note
The log probability and kl_divergence of this distribution is different from
Gaussian distribution
because the output is squashed.- choose_probable()[source]¶
Compute the most probable action of the distribution.
- Returns:
Probable action of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- log_prob(x)[source]¶
Compute the log probability of given input.
- Parameters:
x (Union[nn.Variable, np.ndarray]) – Target value to compute the log probability
- Returns:
Log probability of given input
- Return type:
Union[nn.Variable, np.ndarray]
- property ndim¶
The number of dimensions of the distribution.
- sample(noise_clip=None)[source]¶
Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value
- Return type:
Union[nn.Variable, np.ndarray]
- sample_and_compute_log_prob(noise_clip=None)[source]¶
NOTE: In order to avoid sampling different random values for sample and log_prob, you’ll need to use nnabla.forward_all(sample, log_prob) If you forward the two variables independently, you’ll get a log_prob for different sample, since different random variables are sampled internally.
- sample_multiple(num_samples, noise_clip=None)[source]¶
Sample mutiple value from the distribution New axis will be added between the first and second axis. Thefore, the returned value shape for mean and variance with shape (batch_size, data_shape) will be changed to (batch_size, num_samples, data_shape)
If noise_clip is specified, sampled values will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
num_samples (int) – number of samples per batch
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value.
- Return type:
Union[nn.Variable, np.ndarray]
- sample_multiple_and_compute_log_prob(num_samples, noise_clip=None)[source]¶
NOTE: In order to avoid sampling different random values for sample and log_prob, you’ll need to use nnabla.forward_all(sample, log_prob) If you forward the two variables independently, you’ll get a log_prob for different sample, since different random variables are sampled internally.
- class nnabla_rl.distributions.OneHotSoftmax(z)[source]¶
Bases:
Softmax
Softmax distribution which samples a one-hot vector of class index \(i\) as 1. Class index is sampled according to the following distribution.
\(i \sim \frac{\exp{z_{i}}}{\sum_{j}\exp{z_{j}}}\).
- Parameters:
z (nn.Variable) – logits \(z\). Logits’ dimension should be same as the number of class to sample.
- choose_probable()[source]¶
Compute the most probable action of the distribution.
- Returns:
Probable action of the distribution
- Return type:
Union[nn.Variable, np.ndarray]
- log_prob(x)[source]¶
Compute the log probability of given input.
- Parameters:
x (Union[nn.Variable, np.ndarray]) – Target value to compute the log probability
- Returns:
Log probability of given input
- Return type:
Union[nn.Variable, np.ndarray]
- property ndim¶
The number of dimensions of the distribution.
- sample(noise_clip=None)[source]¶
Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value
- Return type:
Union[nn.Variable, np.ndarray]
- sample_and_compute_log_prob(noise_clip=None)[source]¶
Sample a value from the distribution and compute its log probability.
- Parameters:
noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.
- Returns:
Sampled value and its log probabilty
- Return type:
Union[Tuple[nn.Variable, nn.Variable], Tuple[np.ndarray, np.ndarray]]