Distributions

All probability distributions are derived from nnabla_rl.distributions.Distribution

Distribution

class nnabla_rl.distributions.Distribution[source]
choose_probable() nnabla._variable.Variable[source]

Compute the most probable action of the distribution

Returns

Probable action of the distribution

Return type

nnabla.Variable

entropy() nnabla._variable.Variable[source]

Compute the entropy of the distribution

Returns

Entropy of the distribution

Return type

nn.Variable

kl_divergence(q: nnabla_rl.distributions.distribution.Distribution) nnabla._variable.Variable[source]

Compute the kullback leibler divergence between given distribution. This function will compute KL(self||q)

Parameters

q (nnabla_rl.distributions.Distribution) – target distribution to compute the kl_divergence

Returns

Kullback leibler divergence

Return type

nn.Variable

Raises

ValueError – target distribution’s type does not match with current distribution type.

log_prob(x: nnabla._variable.Variable) nnabla._variable.Variable[source]

Compute the log probability of given input

Parameters

x (nn.Variable) – Target value to compute the log probability

Returns

Log probability of given input

Return type

nn.Variable

mean() nnabla._variable.Variable[source]

Compute the mean of the distribution (if exist)

Returns

mean of the distribution

Return type

nn.Variable

Raises

NotImplementedError – The distribution does not have mean

property ndim: int

The number of dimensions of the distribution

abstract sample(noise_clip: Optional[Tuple[float, float]] = None) nnabla._variable.Variable[source]

Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value

Return type

nn.Variable

sample_and_compute_log_prob(noise_clip: Optional[Tuple[float, float]] = None) Tuple[nnabla._variable.Variable, nnabla._variable.Variable][source]

Sample a value from the distribution and compute its log probability.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value and its log probabilty

Return type

Tuple[nn.Variable, nn.Variable]

sample_multiple(num_samples: int, noise_clip: Optional[Tuple[float, float]] = None) nnabla._variable.Variable[source]

Sample mutiple value from the distribution New axis will be added between the first and second axis. Thefore, the returned value shape for mean and variance with shape (batch_size, data_shape) will be changed to (batch_size, num_samples, data_shape)

If noise_clip is specified, sampled values will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters
  • num_samples (int) – number of samples per batch

  • noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value.

Return type

nn.Variable

List of Distributions

class nnabla_rl.distributions.Bernoulli(z)[source]

Bases: nnabla_rl.distributions.distribution.Distribution

Bernoulli distribution.

\(p^{k}(1-p)^{1-k} \enspace \text{for}\ k\in\{0,1\}\).

Parameters

z (nn.Variable) – Probability of outputting 1 is computed as \(p=sigmoid(z)\).

choose_probable()[source]

Compute the most probable action of the distribution

Returns

Probable action of the distribution

Return type

nnabla.Variable

entropy()[source]

Compute the entropy of the distribution

Returns

Entropy of the distribution

Return type

nn.Variable

kl_divergence(q)[source]

Compute the kullback leibler divergence between given distribution. This function will compute KL(self||q)

Parameters

q (nnabla_rl.distributions.Distribution) – target distribution to compute the kl_divergence

Returns

Kullback leibler divergence

Return type

nn.Variable

Raises

ValueError – target distribution’s type does not match with current distribution type.

log_prob(x)[source]

Compute the log probability of given input

Parameters

x (nn.Variable) – Target value to compute the log probability

Returns

Log probability of given input

Return type

nn.Variable

mean()[source]

Compute the mean of the distribution (if exist)

Returns

mean of the distribution

Return type

nn.Variable

Raises

NotImplementedError – The distribution does not have mean

property ndim

The number of dimensions of the distribution

sample(noise_clip=None)[source]

Sample a value from the distribution.

Parameters

noise_clip (Tuple[float, float], optional) – Noise clip does nothing in Bernoulli distribution.

Returns

Sampled value.

Return type

nn.Variable

sample_and_compute_log_prob(noise_clip=None)[source]

Sample a value from the distribution and compute its log probability.

Parameters

noise_clip (Tuple[float, float], optional) – Noise clip does nothing in Bernoulli distribution.

Returns

Sampled value and its log probabilty

Return type

Tuple[nn.Variable, nn.Variable]

class nnabla_rl.distributions.Gaussian(mean, ln_var)[source]

Bases: nnabla_rl.distributions.distribution.Distribution

Gaussian distribution

\(\mathcal{N}(\mu,\,\sigma^{2})\)

Parameters
  • mean (nn.Variable) – mean \(\mu\) of gaussian distribution.

  • ln_var (nn.Variable) – logarithm of the variance \(\sigma^{2}\). (i.e. ln_var is \(\log{\sigma^{2}}\))

choose_probable()[source]

Compute the most probable action of the distribution

Returns

Probable action of the distribution

Return type

nnabla.Variable

entropy()[source]

Compute the entropy of the distribution

Returns

Entropy of the distribution

Return type

nn.Variable

kl_divergence(q)[source]

Compute the kullback leibler divergence between given distribution. This function will compute KL(self||q)

Parameters

q (nnabla_rl.distributions.Distribution) – target distribution to compute the kl_divergence

Returns

Kullback leibler divergence

Return type

nn.Variable

Raises

ValueError – target distribution’s type does not match with current distribution type.

log_prob(x)[source]

Compute the log probability of given input

Parameters

x (nn.Variable) – Target value to compute the log probability

Returns

Log probability of given input

Return type

nn.Variable

mean()[source]

Compute the mean of the distribution (if exist)

Returns

mean of the distribution

Return type

nn.Variable

Raises

NotImplementedError – The distribution does not have mean

property ndim

The number of dimensions of the distribution

sample(noise_clip=None)[source]

Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value

Return type

nn.Variable

sample_and_compute_log_prob(noise_clip=None)[source]

Sample a value from the distribution and compute its log probability.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value and its log probabilty

Return type

Tuple[nn.Variable, nn.Variable]

sample_multiple(num_samples, noise_clip=None)[source]

Sample mutiple value from the distribution New axis will be added between the first and second axis. Thefore, the returned value shape for mean and variance with shape (batch_size, data_shape) will be changed to (batch_size, num_samples, data_shape)

If noise_clip is specified, sampled values will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters
  • num_samples (int) – number of samples per batch

  • noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value.

Return type

nn.Variable

class nnabla_rl.distributions.Softmax(z)[source]

Bases: nnabla_rl.distributions.distribution.Distribution

Softmax distribution which samples a class index \(i\) according to the following probability.

\(i \sim \frac{\exp{z_{i}}}{\sum_{j}\exp{z_{j}}}\).

Parameters

z (nn.Variable) – logits \(z\). Logits’ dimension should be same as the number of class to sample.

choose_probable()[source]

Compute the most probable action of the distribution

Returns

Probable action of the distribution

Return type

nnabla.Variable

entropy()[source]

Compute the entropy of the distribution

Returns

Entropy of the distribution

Return type

nn.Variable

kl_divergence(q)[source]

Compute the kullback leibler divergence between given distribution. This function will compute KL(self||q)

Parameters

q (nnabla_rl.distributions.Distribution) – target distribution to compute the kl_divergence

Returns

Kullback leibler divergence

Return type

nn.Variable

Raises

ValueError – target distribution’s type does not match with current distribution type.

log_prob(x)[source]

Compute the log probability of given input

Parameters

x (nn.Variable) – Target value to compute the log probability

Returns

Log probability of given input

Return type

nn.Variable

mean()[source]

Compute the mean of the distribution (if exist)

Returns

mean of the distribution

Return type

nn.Variable

Raises

NotImplementedError – The distribution does not have mean

property ndim

The number of dimensions of the distribution

sample(noise_clip=None)[source]

Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value

Return type

nn.Variable

sample_and_compute_log_prob(noise_clip=None)[source]

Sample a value from the distribution and compute its log probability.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value and its log probabilty

Return type

Tuple[nn.Variable, nn.Variable]

sample_multiple(num_samples, noise_clip=None)[source]

Sample mutiple value from the distribution New axis will be added between the first and second axis. Thefore, the returned value shape for mean and variance with shape (batch_size, data_shape) will be changed to (batch_size, num_samples, data_shape)

If noise_clip is specified, sampled values will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters
  • num_samples (int) – number of samples per batch

  • noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value.

Return type

nn.Variable

class nnabla_rl.distributions.SquashedGaussian(mean, ln_var)[source]

Bases: nnabla_rl.distributions.distribution.Distribution

Gaussian distribution which its output is squashed with tanh.

\(z \sim \mathcal{N}(\mu,\,\sigma^{2})\). \(out = \tanh{z}\).

Parameters
  • mean (nn.Variable) – mean \(\mu\) of underlying gaussian distribution.

  • ln_var (nn.Variable) – logarithm of the variance \(\sigma^{2}\). (i.e. ln_var is \(\log{\sigma^{2}}\))

Note

The log probability and kl_divergence of this distribution is different from Gaussian distribution because the output is squashed.

choose_probable()[source]

Compute the most probable action of the distribution

Returns

Probable action of the distribution

Return type

nnabla.Variable

log_prob(x)[source]

Compute the log probability of given input

Parameters

x (nn.Variable) – Target value to compute the log probability

Returns

Log probability of given input

Return type

nn.Variable

property ndim

The number of dimensions of the distribution

sample(noise_clip=None)[source]

Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value

Return type

nn.Variable

sample_and_compute_log_prob(noise_clip=None)[source]

NOTE: In order to avoid sampling different random values for sample and log_prob, you’ll need to use nnabla.forward_all(sample, log_prob) If you forward the two variables independently, you’ll get a log_prob for different sample, since different random variables are sampled internally.

sample_multiple(num_samples, noise_clip=None)[source]

Sample mutiple value from the distribution New axis will be added between the first and second axis. Thefore, the returned value shape for mean and variance with shape (batch_size, data_shape) will be changed to (batch_size, num_samples, data_shape)

If noise_clip is specified, sampled values will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters
  • num_samples (int) – number of samples per batch

  • noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value.

Return type

nn.Variable

sample_multiple_and_compute_log_prob(num_samples, noise_clip=None)[source]

NOTE: In order to avoid sampling different random values for sample and log_prob, you’ll need to use nnabla.forward_all(sample, log_prob) If you forward the two variables independently, you’ll get a log_prob for different sample, since different random variables are sampled internally.

class nnabla_rl.distributions.OneHotSoftmax(z)[source]

Bases: nnabla_rl.distributions.softmax.Softmax

Softmax distribution which samples a one-hot vector of class index \(i\) as 1. Class index is sampled according to the following distribution.

\(i \sim \frac{\exp{z_{i}}}{\sum_{j}\exp{z_{j}}}\).

Parameters

z (nn.Variable) – logits \(z\). Logits’ dimension should be same as the number of class to sample.

choose_probable()[source]

Compute the most probable action of the distribution

Returns

Probable action of the distribution

Return type

nnabla.Variable

log_prob(x)[source]

Compute the log probability of given input

Parameters

x (nn.Variable) – Target value to compute the log probability

Returns

Log probability of given input

Return type

nn.Variable

property ndim

The number of dimensions of the distribution

sample(noise_clip=None)[source]

Sample a value from the distribution. If noise_clip is specified, the sampled value will be clipped in the given range. Applicability of noise_clip depends on underlying implementation.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value

Return type

nn.Variable

sample_and_compute_log_prob(noise_clip=None)[source]

Sample a value from the distribution and compute its log probability.

Parameters

noise_clip (Tuple[float, float], optional) – float tuple of size 2 which contains the min and max value of the noise.

Returns

Sampled value and its log probabilty

Return type

Tuple[nn.Variable, nn.Variable]