Models

All models are derived from nnabla_rl.models.Model

Model

class nnabla_rl.models.model.Model(scope_name: str)[source]

Model Class

Parameters

scope_name (str) – the scope name of model

deepcopy(new_scope_name: str)nnabla_rl.models.model.Model[source]

Create a copy of the model. All the model parameter’s (if exist) associated with will be copied.

Parameters

new_scope_name (str) – scope_name of parameters for newly created model

Returns

copied model

Return type

Model

Raises

ValueError – Given scope name is same as the model or already exists.

get_parameters(grad_only: bool = True)Dict[str, nnabla._variable.Variable][source]

Retrive parameters associated with this model

Parameters

grad_only (bool) – Retrive parameters only with need_grad = True. Defaults to True.

Returns

Parameter map.

Return type

parameters (OrderedDict)

load_parameters(filepath: Union[str, pathlib.Path])None[source]

Load model parameters from given filepath.

Parameters

filepath (str or pathlib.Path) – paramter file path

save_parameters(filepath: Union[str, pathlib.Path])None[source]

Save model parameters to given filepath.

Parameters

filepath (str or pathlib.Path) – paramter file path

property scope_name: str

Get scope name of this model.

Returns

scope name of the model

Return type

scope_name (str)

List of Models

class nnabla_rl.models.Perturbator(scope_name)[source]

Bases: nnabla_rl.models.model.Model

DeterministicPolicy Abstract class for perturbator

Perturbator generates noise to append to current state’s action

class nnabla_rl.models.Policy(scope_name: str)[source]

Bases: nnabla_rl.models.model.Model

class nnabla_rl.models.DeterministicPolicy(scope_name: str)[source]

Bases: nnabla_rl.models.policy.Policy

Abstract class for deterministic policy

This policy returns an action for the given state.

abstract pi(s: nnabla._variable.Variable)nnabla._variable.Variable[source]
Parameters

state (nnabla.Variable) – State variable

Returns

Action for the given state

Return type

nnabla.Variable

class nnabla_rl.models.StochasticPolicy(scope_name: str)[source]

Bases: nnabla_rl.models.policy.Policy

Abstract class for stochastic policy

This policy returns a probability distribution of action for the given state.

abstract pi(s: nnabla._variable.Variable)nnabla_rl.distributions.distribution.Distribution[source]
Parameters

state (nnabla.Variable) – State variable

Returns

Probability distribution of the action for the given state

Return type

nnabla_rl.distributions.Distribution

class nnabla_rl.models.QFunction(scope_name: str)[source]

Bases: nnabla_rl.models.model.Model

Base QFunction Class

all_q(s: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute Q-values for each action for given state

Parameters

s (nn.Variable) – state variable

Returns

Q-values for each action for given state

Return type

nn.Variable

argmax_q(s: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute the action which maximizes the Q-value for given state

Parameters

s (nn.Variable) – state variable

Returns

action which maximizes the Q-value for given state

Return type

nn.Variable

max_q(s: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute maximum Q-value for given state

Parameters

s (nn.Variable) – state variable

Returns

maximum Q-value value for given state

Return type

nn.Variable

abstract q(s: nnabla._variable.Variable, a: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute Q-value for given state and action

Parameters
  • s (nn.Variable) – state variable

  • a (nn.Variable) – action variable

Returns

Q-value for given state and action

Return type

nn.Variable

class nnabla_rl.models.ValueDistributionFunction(scope_name: str, n_action: int, n_atom: int, v_min: float, v_max: float)[source]

Bases: nnabla_rl.models.model.Model

Base value distribution class.

Computes the probabilities of q-value for each action. Value distribution function models the probabilities of q value for each action by dividing the values between the maximum q value and minimum q value into ‘n_atom’ number of bins and assigning the probability to each bin.

Parameters
  • scope_name (str) – scope name of the model

  • n_action (int) – Number of actions which used in target environment.

  • n_atom (int) – Number of bins.

  • v_min (int) – Minimum value of the distribution.

  • v_max (int) – Maximum value of the distribution.

all_probs(s: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute probabilities of atoms for all posible actions for given state

Parameters

s (nn.Variable) – state variable

Returns

probabilities of atoms for all posible actions for given state

Return type

nn.Variable

as_q_function()nnabla_rl.models.q_function.QFunction[source]

Convert the value distribution function to QFunction.

Returns

QFunction instance which computes the q-values based on the probabilities.

Return type

nnabla_rl.models.q_function.QFunction

max_q_probs(s: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute probabilities of atoms for given state that maximizes the q_value

Parameters

s (nn.Variable) – state variable

Returns

probabilities of atoms for given state that maximizes the q_value

Return type

nn.Variable

abstract probs(s: nnabla._variable.Variable, a: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute probabilities of atoms for given state and action

Parameters
  • s (nn.Variable) – state variable

  • a (nn.Variable) – action variable

Returns

probabilities of atoms for given state and action

Return type

nn.Variable

class nnabla_rl.models.QuantileDistributionFunction(scope_name: str, n_action: int, n_quantile: int)[source]

Bases: nnabla_rl.models.model.Model

Base quantile distribution class.

Computes the quantiles of q-value for each action. Quantile distribution function models the quantiles of q value for each action by dividing the probability (which is between 0.0 and 1.0) into ‘n_quantile’ number of bins and assigning the n-quantile to n-th bin.

Parameters
  • scope_name (str) – scope name of the model

  • n_action (int) – Number of actions which used in target environment.

  • n_quantile (int) – Number of bins.

all_quantiles(s: nnabla._variable.Variable)nnabla._variable.Variable[source]

Computes the quantiles of q-value for each action for the given state.

Parameters

s (nn.Variable) – state variable

Returns

quantiles of q-value for each action for the given state

Return type

nn.Variable

as_q_function()nnabla_rl.models.q_function.QFunction[source]

Convert the quantile distribution function to QFunction.

Returns

QFunction instance which computes the q-values based on the quantiles.

Return type

nnabla_rl.models.q_function.QFunction

max_q_quantiles(s: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute the quantiles of q-value for given state that maximizes the q_value

Parameters

s (nn.Variable) – state variable

Returns

quantiles of q-value for given state that maximizes the q_value

Return type

nn.Variable

quantiles(s: nnabla._variable.Variable, a: nnabla._variable.Variable)nnabla._variable.Variable[source]

Computes the quantiles of q-value for given state and action.

Parameters
  • s (nn.Variable) – state variable

  • a (nn.Variable) – action variable

Returns

quantiles of q-value for given state and action.

Return type

nn.Variable

class nnabla_rl.models.StateActionQuantileFunction(scope_name: str, n_action: int, K: int, risk_measure_function: Callable[[nnabla._variable.Variable], nnabla._variable.Variable] = <function risk_neutral_measure>)[source]

Bases: nnabla_rl.models.model.Model

state-action quantile function class.

Computes the return samples of q-value for each action. State-action quantile function computes the return samples of q value for each action using sampled quantile threshold (e.g. \(\tau\sim U([0,1])\)) for given state.

Parameters
  • scope_name (str) – scope name of the model

  • n_action (int) – Number of actions which used in target environment.

  • K (int) – Number of samples for quantile threshold \(\tau\).

  • risk_measure_function (Callable[[nn.Variable], nn.Variable]) – Risk measure funciton which modifies the weightings of tau. Defaults to risk neutral measure which does not do any change to the taus.

all_quantile_values(s: nnabla._variable.Variable, tau: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute the return samples for all action for given state and quantile threshold.

Parameters
  • s (nn.Variable) – state variable.

  • tau (nn.Variable) – quantile threshold.

Returns

return samples from implicit return distribution for given state using tau.

Return type

nn.Variable

as_q_function()nnabla_rl.models.q_function.QFunction[source]

Convert the state action quantile function to QFunction.

Returns

QFunction instance which computes the q-values based on return samples.

Return type

nnabla_rl.models.q_function.QFunction

max_q_quantile_values(s: nnabla._variable.Variable, tau: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute the return samples from distribution that maximizes q value for given state using quantile threshold.

Parameters
  • s (nn.Variable) – state variable.

  • tau (nn.Variable) – quantile threshold.

Returns

return samples from implicit return distribution that maximizes q for given state using tau.

Return type

nn.Variable

quantile_values(s: nnabla._variable.Variable, a: nnabla._variable.Variable, tau: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute the return samples for given state and action.

Parameters
  • s (nn.Variable) – state variable.

  • a (nn.Variable) – action variable.

  • tau (nn.Variable) – quantile threshold.

Returns

return samples from implicit return distribution for given state and action using tau.

Return type

nn.Variable

sample_tau(shape: Optional[Iterable] = None)nnabla._variable.Variable[source]

Sample quantile thresholds from uniform distribution

Parameters

shape (Tuple[int] or None) – shape of the quantile threshold to sample. If None the shape will be (1, K).

Returns

quantile thresholds

Return type

nn.Variable

class nnabla_rl.models.reward_function.RewardFunction(scope_name: str)[source]

Bases: nnabla_rl.models.model.Model

Base reward function class

abstract r(s_current: nnabla._variable.Variable, a_current: nnabla._variable.Variable, s_next: nnabla._variable.Variable)nnabla._variable.Variable[source]

Computes the reward for the given state, action and next state. One (or more than one) of the input variables may not be used in the actual computation.

Parameters
  • s_current (nnabla.Variable) – State variable

  • a_current (nnabla.Variable) – Action variable

  • s_next (nnabla.Variable) – Next state variable

Returns

Reward for the given state, action and next state.

Return type

nnabla.Variable

class nnabla_rl.models.VFunction(scope_name: str)[source]

Bases: nnabla_rl.models.model.Model

Base Value function class

abstract v(s: nnabla._variable.Variable)nnabla._variable.Variable[source]

Compute the state value (V) for given state

Parameters

s (nn.Variable) – state variable

Returns

State value for given state

Return type

nn.Variable

class nnabla_rl.models.Encoder(scope_name: str)[source]

Bases: nnabla_rl.models.model.Model

abstract encode(x: nnabla._variable.Variable, **kwargs)nnabla._variable.Variable[source]

Encode the input variable to latent representation.

Parameters

x (nn.Variable) – encoder input.

Returns

latent variable

Return type

nn.Variable

class nnabla_rl.models.VariationalAutoEncoder(scope_name: str)[source]

Bases: nnabla_rl.models.encoder.Encoder

abstract decode(z: Optional[nnabla._variable.Variable], **kwargs)nnabla._variable.Variable[source]

Reconstruct the latent representation.

Parameters

z (nn.Variable, optional) – latent variable. If the input is None, random sample will be used instead.

Returns

reconstructed variable

Return type

nn.Variable

abstract decode_multiple(z: Optional[nnabla._variable.Variable], decode_num: int, **kwargs)[source]

Reconstruct multiple latent representations.

Parameters

z (nn.Variable, optional) – encoder input. If the input is None, random sample will be used instead.

Returns

Reconstructed input and latent distribution

Return type

nn.Variable

abstract encode_and_decode(x: nnabla._variable.Variable, **kwargs)Tuple[nnabla_rl.distributions.distribution.Distribution, nnabla._variable.Variable][source]

Encode the input variable and reconstruct.

Parameters

x (nn.Variable) – encoder input.

Returns

latent distribution and reconstructed input

Return type

Tuple[Distribution, nn.Variable]

abstract latent_distribution(x: nnabla._variable.Variable, **kwargs)nnabla_rl.distributions.distribution.Distribution[source]

Compute the latent distribution \(p(z|x)\).

Parameters

x (nn.Variable) – encoder input.

Returns

latent distribution

Return type

Distribution